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(57) ABSTRACT 

A method and apparatus are provided for retrieving docu- 
ments from a collection of documents based on information 
other than the contents of a desired document. The collection 
of documents, which may be a hypertext system or docu- 
ments available via the World Wide Web, is indexed. In one 
embodiment, an indexing process of a search engine 
receives one or more specifications that identify documents, 
or document locations, and non-content information such as 
a tag word or code word. The indexing process searches the 
index to identify all documents in the index that match one 
or more of the specifications. If a match is found, the tag 
word is added to the index, and information about the 
matching document is stored in the index in association with 
the tag word. A search query is submitted to the search 
engine. The search query is automatically modified to add a 
reference to the tag word, such as a query term that will 
exclude any index entry for a document associated with the 
tag word. The search is executed against the index, and a set 
of search results is generated. Accordingly, the search results 
automatically exclude all documents associated with the tag 
word. These techniques may be used, for example, to 
implement a Web search service that produces more accurate 
search results or that prevents certain documents, such as 
pornographic materials, from appearing in search results. 

18 Claims, 10 Drawing Sheets 
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METHOD AND APPARATUS FOR 
RETRIEVING DOCUMENTS BASED ON 
INFORMATION OTHER THAN DOCUMENT 
CONTENT 

FIELD OF THE INVENTION 5 
The present invention generally relates to data processing. 
The invention relates more specifically to retrieving a docu- 
ment from among several electronic documents based on 
information not derived from the literal content of the 
document. 10 

BACKGROUND OF THE INVENTION 
Hypertext systems now enjoy wide use. One particular 
hypertext system, the World Wide Web ("Web"), provides 
global access over public packet-switched networks to a i$ 
large number of hypertext documents. The Web has grown 
to contain a staggering number of documents, and the 
number of documents continues to increase. The number of 
documents available through the Web is so large that to use 
the Web in a practical way almost always requires a search 2Q 
service, search engine, or similar service. 

Certain search engines, however, have limited utility 
because the search results they produce include documents 
that are not relevant to the search query. In particular, many 
search engines return search results that list documents that 25 
are not genuinely related to the search query. One reason that 
search engines return such poor-quality results is that the 
search engines are easy to deceive. The search engines use 
"spider" programs that "crawl" to Web servers around the 
world, locate documents, index the documents, and follow 30 
hyperlinks to other documents. The index may comprise a 
list of all words encountered by the "spider" in all the 
documents, in which each word in the list is associated with 
a reference to each of the documents that contains that word. 
Unfortunately, the "spiders** cannot discriminate among 3S 
documents that genuinely use a particular word and docu- 
ments that contain the word, but are really about something 
else. 

For example, a Web document that contains sexually- 
oriented or pornographic material may also contain one or 40 
more words that are unrelated to the sexual material, but arc 
intended to cause the document to be indexed by search 
engines under those words, thereby luring unsuspecting 
browsers to the document. A pornographic document that 
contains a decoy word intended to lure male viewers, such 4S 
as "CORVETTE," for example, followed by sexual material, 
would be indexed by a search engine under the word 
"CORVETTE". The decoy words may be embedded in 
invisible metatags or rendered in white characters on a white 
background, so as to be invisible when the document is 59 
displayed by the browser. This practice is called "spam- 
ming" a search engine or an indexing system. Searchers who 
submit a query to the search engine or indexing system that 
seeks information about the motion picture "Bambi" would 
receive the pornographic page in the search results. This is 55 
undesirable and has led to criticism of the utility of search 
engines and indexing systems. 

As a result, the search results returned by the search 
engine often contain references to the documents that are 
totally unrelated, in terms of genuine content, to the scope of 50 
a search query. In the World Wide Web context, search 
engines that suffer from this problem include the Yahoo !® 
Web site, the Excite® Web site, the Infoseek® Web site, and 
others. 

Accordingly, in this field there is a need for a system or 65 
mechanism that can eliminate extraneous references from 
search engine search results. 
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There is a particular need for a system or mechanism that 
can combat "spamming" of an indexing system or search 
engine system. 

There is also a need for a mechanism that can associate 
words, search terms, or editorial matter, other than words 
appearing in the content of a document, with the document 
in an index. 

There is a particular need for such a system that can carry 
out a search for a document based on words, search terms, 
or editorial matter other than the literal content of a group of 
documents. 

SUMMARY OF THE INVENTION 

The foregoing needs, and other needs and objects that will 
become apparent from the following disclosure, are fulfilled 
by the present invention, which comprises, in one aspect, a 
method of selecting electronic documents from among a 
plurality of electronic documents, the method comprising 
the steps of storing a tag word in an index in association with 
information identifying an electronic document, in which 
the tag word comprises data that is not derived from content 
of the electronic document; receiving a search query; modi- 
fying the search query to create a modified search query by 
adding to the search query a search term that references the 
tag word; and creating a set of search results by searching 
the index based on the modified search query. 

One feature of this aspect is that the step of storing 
includes the steps of receiving data that indicates one or 
more tag words and criteria to be used to determine which 
of the plurality of documents should be associated with each 
of the one or more tag words; and storing, in the index, 
information associating each of the one or more tag words 
with the documents in the index that satisfy the criteria 
associated with the tag words. Another feature is that the 
step of storing includes the steps of receiving data that 
indicates one or more tag words and criteria to be used to 
determine which of the plurality of documents should be 
associated with each of the one or more tag words, and in 
which at least a portion of the data is expressed in a wildcard 
format; retrieving a location identifier of each of the docu- 
ments that are indexed in the index; matching each location 
identifier to each of the criteria; and when one location 
identifier matches one of the criteria, storing, in the index, 
information associating such location identifier with one or 
more of the tag words. 

In another feature, the step of storing includes the steps of 
receiving specifications of one or more of the documents 
that are indexed in the index, in which each of the specifi- 
cations is associated with one or more tag words, and in 
which one of the specifications is expressed in a wildcard 
format; retrieving a location identifier of each of the docu- 
ments that arc indexed in the index; matching each location 
identifier to each of the specifications by interpreting the one 
of the specifications that is in the wildcard format according 
to one or more wildcard format rules; and when one location 
identifier matches one of the specifications, storing, in the 
index, information associating such location identifier with 
one or more of the tag words. In another feature, storing 
includes the steps of storing a hash value representing the tag 
word in a record of the index; and storing an indirect 
reference to information identifying one or more of the 
documents that contain the tag word. 

Another aspect of the invention provides a method of 
restricting access to an electronic document that is stored 
among a plurality of documents, the method comprising the 
steps of storing a tag word in an index in association with 
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information identifying the electronic document, in which 
the tag word indicates that access to the electronic document 
is restricted; receiving a search query that requests the 
electronic document; modifying the search query to create a 
modified search query by adding a search term that excludes 
from the modified search query all documents that contain 
the tag word; and creating a set of search results by search- 
ing the index based on the modified search query. One 
feature of this aspect is that the step of modifying comprises 
the step of modifying, automatically and using a software 
component of a browser, the search query to create a 
modified search query by adding a search term that excludes 
from the modified search query all documents that contain 
the tag word. 

Another feature of this aspect is that the modified search 
query selects only those electronic documents that satisfy 
the original search query that also contain the tag word. A 
related feature is that the modified search query selects only 
those electronic documents that satisfy the original search 
query that do not contain the tag word. 

In another aspect, the invention provides a method of 
processing queries that select an electronic document from 
among a plurality of documents, the method comprising the 
steps of storing a tag word in an index in association with 
information identifying the electronic document, in which 
the tag word indicates that access to the electronic document 
is restricted; receiving a search query that requests the 
electronic document; modifying the search query to create a 
modified search query by adding a search term that refer- 
ences the tag word; and creating a set of search results by 
searching the index based on the modified search query. 

One feature of this aspect is that the modifying step 
further comprises using a software component installed in a 
browser to perform the steps of intercepting each search 
query entered using the browser; and modifying the search 
query that is intercepted to create the modified search query 
by adding the search term that references the tag word. A 
related feature is that the step of storing includes the steps of 
receiving specifications of one or more of the documents 
that are indexed in the index, in which each of the specifi- 
cations is associated with the tag word; and storing, in the 
index, information associating one or more of the documents 
that are indexed in the index with the tag word, according to 
the specifications. 

Still another aspect of the invention involves a method of 
constructing an index of a plurality of electronic documents 
for use in selecting electronic documents from among the 
plurality of electronic documents, comprising the steps of 
receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality of 
documents should be associated with each of the one or 
more tag words, wherein the tag words are not derived from 
content of the electronic documents; storing a list of words 
that are within one document of the plurality of documents; 
and storing, in the index, information associating each of the 
one or more tag words with the one document when the one 
document satisfies the criteria associated with the tag words. 

According to yet another aspect, there is a method of 
constructing an index of a plurality of electronic documents 
for use in selecting electronic documents from among the 
plurality of electronic documents, comprising the steps of 
receiving data that indicates one or more document property 
values and criteria to be used to determine which of the 
plurality of documents should be associated with each of the 
one or more document property values, wherein the docu- 
ment property values are not derived from content of the 
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electronic documents; storing a list of words that are within 
one document of the plurality of documents; storing, in the 
index, information associating each of the one or more 
document property values with the one document when the 

5 one document satisfies the criteria associated with the docu- 
ment property values. 

Another aspect of the invention is a method of selecting 
electronic documents from among a plurality of electronic 
documents, the method comprising the steps of storing a 

10 document property value in an index in association with 
information identifying an electronic document, in which 
the document property value comprises data that is not 
derived from content of the electronic document; receiving 
a search query; modifying the search query to create a 

15 modified search query by adding to the search query a search 
term that references the document property value; and 
creating a set of search results by searching the index based 
on the modified search query. 
The invention also encompasses a computer system, a 

20 computer-readable medium, and a computer data signal 
embodied in a carrier wave that are configured to carry out 
the foregoing steps. 
The foregoing summary is not intended to describe or 

25 summarize all features or aspects of the invention, which are 
set forth fully in the following description and claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example, 
30 and not by way of limitation, in the figures of the accom- 
panying drawings and in which like reference numerals refer 
to similar elements and in which: 

FIG. 1 is a block diagram of a document search system in 
which one embodiment of the invention may be used. 
35 FIG. 2A is a flowchart of a process of associating non- 
content information with an index of documents. 

FIG. 2B is a flowchart of an alternate process of associ- 
ating non-content information with an index of documents 
as the index is constructed 

40 

FIG. 3 is a diagram of an exemplary list of document 
specifications that is received as input by the process of FIG. 
2. 

FIG. 4Ais a diagram of a word index structure that is used 
45 in an index organized according to a preferred embodiment 
of the invention. 

FIG. 4B is a diagram of document index structure and a 
document properties structure that are used in an index 
organized according to a preferred embodiment of the inven- 
50 tion. 

FIG. 4C is a diagram of a global data structure and a word 
data structure that are used in an index organized according 
to a preferred embodiment of the invention. 
55 FIG. 4D is a diagram of the global data structure of FIG. 
4C, and an adjacency index structure, that are used in an 
index organized according to a preferred embodiment of the 
invention. 

FIG. 5 is a flowchart of a process of carrying out a search 
60 query based on non-content information. 

FIG. 6 is a block diagram of a computer system hardware 
arrangement that may be used to carry out an embodiment. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

A method and apparatus, for selecting electronic docu- 
ments from among a plurality of electronic documents, is 
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described. In the following description, for the purposes of documents 4a, 4b, as shown in the case of server 2b. 

explanation, numerous specific details are set forth in order Alternatively, a server 2a may have a front end component 

to provide a thorough understanding of the present inven- 6, such as a data presentation layer, that can receive infor- 

tion. It will be apparent, however, to one skilled in the art mation about documents 4a, 4b and specially format the 

that the present invention may be practiced without these 5 information for presentation to a client, 

specific details. In other instances, well-known structures In the preferred embodiment, servers 2^, 26 are Hypertext 

and devices are shown in block diagram form in order to Transfer Protocol (HTTP) servers, the network connections 

avoid unnecessarily obscuring the present invention. 5 m TCP/IP connections or other connections that support 

HTTP transfers, and the network 8 is the global, packet- 

I. FUNCTIONAL OVERVIEW 1Q switched network known as the Internet. 

™,. j- i j r . • ■ vL By wa Y of example, twOidocuments*4^4B"alre^hownrIn 

Techniques are disclosed for storing, m an index that has practical ^ howeverj ^ be miuions of 

been constructed from the content of a plurality documents documents accessible through thousands of servers. The 

of a hypertext system, information not found in the docu- location^ each -document-is uniquely identified by a 

ments. These techniques provide a way to associate meta- location identifier. An^example of a location identifier is a 

information, editorial information, commentary, or other 15 Uniform Resource Locator (URL). 

tags with documents. Accordingly, the index— and, One or more clients 10a, 10b are coupled by network 

indirectly, the documents of the hypertext system — can be connections to the network 8. The clients 10a, 106 are 

searched based upon non-content information. personal computers, workstations, or servers that can request 

In general, in one embodiment, the foregoing techniques documents 4a, 4b from the network 8 and present the 

are carried out in the context of an index that has been 20 documents, or information relating to the documents, to a 

constructed for a plurality of documents. The index includes user - m a preferred embodiment, a client 10Z> executes a 

an ordered list of words, where each word in the list has been browser program 12, such as a World Wide Web browser, 

recognized in at least one of the documents. Each word in ^ client 10b connects to network 8 using a TCP/IP 

the list is associated with one or more references to docu- conn ection , and connects to one of the servers 2a, 2b using 

ments that contain that word. The index further includes a 25 me HlTF protocol There may be millions of clients 10a, 

list of document location identifiers for all the documents 10fe m a Poetical embodiment of the system, 

that are referenced in the index. One of the techniques 1° one mode of operation, a client 106 submits one or 

involves receiving a list of document specifications, which more requests to server 2b to retrieve a particular document 

may be expressed in literal or wildcard format. A "document from among documents 4a, 4b. The server 2b locates the 

specification" is data that indicates criteria for associating 30 requested document and returns a copy of it to the client 10b 

documents with a tag word, where the criteria is something through the network 8. A request and response in the HTTP 

other than the documents actually containing the tag word. /protocol can be used to carry out this mode of operation. 

Each document specification is associated with one or more vS Preferably, a search engine l4 is coupled to the network 

tag words. Each document location identifier is retrieved * 8 and to an index 16, The search engine 14 is a specialized 

from the list and matched against the list of document 35 set oLa nsi or more software components. T he particular 

specifications. If there is a match between the document internal construction of the search engine 14 is not 

specification and a document location identifier, then the tag important, and the structure of a search engine is known in 

word associated with the document specification is added to mis ficld - Wnat k important is that the search engine 14 can 

the index, and the document associated with the document r eceive a search request from on e of the clients 10a, 10b or 

location identifier that matches the document specification is 40 one of the servets^g^, search the in dex 16 to identify one 

indexed against that tag word. or rnore records that are within the scope of the search 

As a result, an existing document index is supplemented request, and return information from the records ("search 

with tag words, and documents matching one or more of the results ') *> req^sUng <*ent through network 8. 

document specifications are indexed against the tag words in *5* wter V* » f u P led |° the network 8 and to an indexer 

the same way that they would be indexed if they had actually 45 20 \ 11 f CTawler indexer 20 cooperate to periodically 

contained the tag words. Accordingly, the index can be visit documents 4a, 4b, and all other documents that are 

searched based upon the tag words, thereby allowing docu- accessible through the network 8, and construct the index 16 

ments in the system to be searched based upon information ba f d on ^ c contcnts of mc documents. Tne crawler 18 and 

that is not actually in the documents. mdexer 20 m ^ »*pnse software components that carry 

50 out the foregoing functions, and may be integrated as one 

II. OPERATIONAL CONTEXT component. 

A. SEARCH SYSTEM Crawler 18 and indexer 20 may construct and operate on 
Embodiments of the invention may be implemented in a one or more interim indexes that are created offline and 
variety of contexts. A specifically preferred operational made "live" later. An interim index is constructed during an 
context is a search system for the World Wide Web, includ- 55 offline crawling and indexing phase, and when the interim 
ing a Web crawler or "spider" system, an index of Web index is complete, it is merged into the index 16. In this way, 
documents, and a search engine that can receive a search the crawler 18 and indexer 20 can construct an index without 
query and find matching information in the index. The interfering in the operation of search engine 14, which uses 
invention and embodiments thereof are not limited to this the "live" index 16. Use of interim indexes is also favored 
context, which is illustrated only as an example of how an 60 because the indexing process has been found to be memory- 
embodiment can be used. intensive. The indexer 20 can build a temporary index in 
FIG. 1 is a block diagram of an exemplary operating volatile memory, and then store the index information in an 
context for an embodiment that involves an index of a search interim index in non- volatile memory when the indexer runs 
engine, or similar facility, for set of hypertext documents or out of volatile memory. 

for a hypertext database. One or more servers 2a, 2b are 65 B. CRAWLING AND INDEXING DOCUMENTS 

coupled by network connections 5 to a network 8. The Index 16 may comprise one or more tables, files, or 

servers 2a, 2b may be associated with, or store, one or more sub-indexes. For example, index 16 may comprise a word 
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index and a document index. The word index is an alpha- 
betic list of all words encountered by crawler 18 and indexer 
20 in all documents 4a, 4b, Each word in the word index is 
associated with one or more document identifiers that iden- 
tify documents that contain the word. The document index 5 
maps the document identifiers to specific document location 
identifiers, or to URLs,.orT-tO'Oth^r-information that may be 
displayed after a search, such as document title or document 
abstract-. 

Operation of crawler 18 and indexer 20 may involve to, 
receiving and reading one or more document identifiers, * 
each of which identifies a hypertext document. For example, 
crawler^^ec^es«URI^i^^ 

document among documents 4a, 46. Crawler 18 may call a 
process, procedure, program or subroutine and_ provide it 15 
with a list of URLs that the crawler has not yet visited. 
Crawler_18 may also retrieve a document identified by-each 
URL using an HTTP request. 

Each document in the list of URLs i s scanned and its 



25 



35 



40 



co ntent is examin ed. Each h^eflink'within tEedocument is 
identified. In one embodiment, the documents are formatted 
using Hypertext Markup Language (HTML), and crawler 18 
detects each HTML anchor and associated hypertext refer- 
ence in the document. The hyperlinks are added to a 
crawling queue. The crawling queue is a list of document 
identifiers or URLs that need to be visited by the process. 
When the process completes processing of the-location 
identifier s^ that were T Jr eviousj v__^btained, the process 
retrieves the next location idtentifi erin the crawl ing queue. 
In this way, the process eventually visits all documents to 30 
which a particular document points. 

For each document that is visited, the system generates a 
unique document identifier and stores the document identi- 
fier in a list of visited documents. In one embodiment, the 
list is implemented in the form of a vector of visited URLs, 
in which there is one entry for each visited URL. The 
process may later search the bit vector to determine whether 
the system has previously visited a particular URL. 

The content of each document is fed to indexer 20, which 
carries out two main functions. 

First, the indexer 20 c onstructs an index^recoKL oLthe 
current document and stores the index record in the docu- 
ment index of index 16. E ach index rec ord contain^ amon g 
otherjhings, ajias h^ valu e that uniquel y represents the jex t 
contents ot thTassodatea^lo^ument' Ea c h index record also 45 
co ntains the lft fiati gn identifier of the current doc ume nt, and 
may also contain values of properties that ma y be displa yed 
i n ji search jesult_page such a s^docurnenUitle, docu ment 
summary, or otners. J he location identifier may also be 
stored in hashed form. 50 

Second, the indexer 20 reads each word in the document 
and indexes the document under that word in the word index 
of index 16. This function may involve reading a word from 
a document, checking whether the word is in the word index, 
adding the word to the word index if it is not found, and 55 
associating a reference to the document with the word in the 
word index. For example, after this process is completed for 
a set of documents that contain the word "apple", a record 
of the word index may comprise, in simplified form, the 
values "apple," "26," "9," "107," "272." This record indi- 60 
cates that the word "apple" appears in documents "26," "9," 
"107," and "272." The numeric values serve as references to 
the documents or to the true locations of the documents in 
the network. 

For efficiency and speed, the indexer 16 may store words 65 
in the word index in hashed form. For each word in the 
document, the indexer 16 applies a hash function to the 



word. In the preferred embodiment, the MD5 hash function 
is applied, which a fixed-length, 16-byte hash value. The 
MD5 hash function is described in detail in B. Schneier, 
"Applied Cryptography" (John Wiley & Sons, 2d ed. 1996), 
at pp. 436. Use of the MD5 hash function is not required. It 
is desirable to use a hash function that generates a fixed- 
length hash value as output, has a uniform distribution of 
values, and has a low collision rate, such that the hash value 
uniquely identifies each word that is hashed. 
*\ As a result, a rapidly searchable index of all words in all 
hhe crawled documents is created. The index is then pub- 
lished to one or more search nodes. Publication may involve 
sending one or more publication messages to one or more 
search processes. The publication messages inform the 
search processes that the sorted index is available for use in 
searching. Th e search p rocesses may im plement , for, 
ex ample, a Web document search servicer An ex ample ofa 
h service that may use the foregoing processes is me 
>TBOT® search service commercially available at the 
URL http://www.hotbot.com/. 
C. SEARCHING THE INDEX 

To search the index 16, browser 12 s ubmits a search query 
to search engine 14. The search query contains one or more s 
words, for example, "INKTOMI CORPORATION". The^ 
search engine 14 matches words in the query to words in the 
word list of index 16. The index returns, as search results, 
information about the documents that are identified by 
document identifiers associated with matching words in the 
word list. The returned information may in clude the title of 
a document, an a bstract..the Jocation identmerorJURLoLth e 
d ocument. The searcfa_r&sults_are_retunied to browser 12 fo r 
preseptationao.a. user. 

Alternatively, the client 10a connects to the server 2a. The 
browser 12 submits a search query to server 2a. Server 2a 
forwards the search query to search engine 14 that produces 
search results as described above. The search results are 
returned to a server 2a. Front end 6 of server 2a formats the 
search results and delivers a formatted page that contains the 
search results to browser 12. 

HI. ASSOCIATING DOCUMENTS WITH NON- 
CONTENT INFORMATION 
A. SUPPLEMENTING THE INDEX 

FIG. 2A is a flow diagram of a process of associating 
non-content information with documents and the index 
16. In this context, "non-content information" refers to 
any information that does not form part of the literal 
content of a document, that is, information other than 
the words or other material of a document that are 
indexed by the indexer 16 in the manner described 
above. The non-content information is represented by 
one or more tag words that are added to the index 16 
and associated with one or more documents. 
The process of FIG. 2 is undertaken after index 16 is 
constructed. In particular, the process of FIG. 2 presumes 
that another process has constructed a word index of the 
documents that are accessible in a system, and has con- 
structed a list or table of location identifiers or URLs of 
documents that are accessible to the system. 

As shown by block 202, the process receives a list of 
records, which may be presented to the process in the form 
of a data file. Each record comprises a document specifica- 
tion and one or more associated tag words. 

FIG. 3 is a block diagram of an exemplary list 130 that 
comprises document specifications 132 and tag words 134. 
list 130 may contain any number of pairs of document 
specifications and tag words. 
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^ A ta gffq rfl is a n y character string. that is to be associa ted 
wit h_a document for search purposes . Often, the tag words 
are dedicated code words, or words that are not normally 
found in a document or dictionary, although this character- 
istic is not required. Examples,of tag words include "n2h2/ 
black 5 ' and "n2h2/white", as shown by tag words 138a, 138c 
of FIG. 3. Oth er tag words may be properties or_meta- 



10 



information such aq foe t\[\e of a .d ocument, abstract, _ or 
others, as described further below. For example, a tag word 
may be "ADVERTISEMENT" to indicate that its associated 

Mfehjpagef^ contain a^rfi'^n^r A ta^r worc J ma y be "VERI- 
FIED" to indicate that its associated Web page(s) contain 
factual information that has been verified by some indepen- 
dent third party. 

Each of the document specifications 132 specifies a 
matching criteria. The documents that satisfy the matching 
criteria are specified in a document specification associated, 
in the index, with the tag word 134 of the record having that 
document specification. A document specification 132 may, 
for example, an expression that identifies the location of a| 
document in a network, such as a URL. J 20 

In a preferred embodiment, the document specification^ 
may be expressed in a wildcard format. Using a wildcard 
format for the document specification allows a particular tag 
word to be associated with more than one URL, without 
requiring each URL to be identified literally. I^cumrat 
sr>ecificaffnn^lJ6fl-136c of FIG. 3 are expressed in wild- 
card^fiinnaL^r exampleTdocument specification 136a is 
httip://*. hotsex.com/. Tne."*" c haracter in HncumenJ s^eri- 
ficatioa,Jl_6fl. indi cates that the document ^ci ^cation 
includes any server within the domain '^Eotsex .com^JWhe n 
the process of FIG. 2 processes document specification 
136<z, the code word 138a will be associated in the index 16 
with any indexed document having a location identifier that 
matches document specification 136a, as explained further 
below. 

No particular wildcard format or syntax is required. In 
practice, however, having a formal wildcard specification or 



Thereafter, or if the tag word is already in the word index, 
control is passed to block 216. The process obtains a 
document identifier that is associated with the current loca- 
tion identifier. The document identifier is stored in the word 
5 index in association with the tag word. 

<w For example, in the context of an index of Jifcfcb 
\A* documents, b lock 2 06 involves mat ching.a_URL indexedjn 
trie system to o ne of the ^baimenfsp ec ifications 132 j of .list 
1 30 r sucrraydQcument specificatio n 155a . I f a match occur s, 
the BrQcessj£trj£y£S_a dc^umejtUdentifieFthat is a ssociated 
wi th the matching URL. The process finds the value of'fag 
word 138a in the word index of index 16. The document 
identifier is stored in the word index in association with that 
word value. 

As a result, a particular document id entified by a partial - 
15 lar URL is now indexed in the systenun associaf ftrn~^tL a, 
ta ^yoroT The process is earned out or iterated for each URL 
that is indexed in the system and for each of the document 
specifications 132 in list 130. *~ 

The foregoing describes a process of taking an index after 
it has been constructed and marking it up with non-content 
information. Generally, indexing involves taking a 
document, converting it into a list of words and their 
positions, and merging these lists into a final index. FIG. 2B 
is a flow diagram of an alternate embodiment whereby the 
non-content information is added to the index at the time the 
document is being indexed. 

As shown in FIG. 2B, block 220, a list of document 
specifications and tag words is received. Block 220 may 
involve the same steps set forth above in connection with 
block 202 of FIG. 2A. In block 222, the next document to 
be indexed is retrieved. The steps of block 220 may occur in 
the normal indexing process in which documents are 
sequentially retrieved and indexed. Similarly, as part of the 
normal indexing process, the document is converted into a 
word list and a list of the positions of the words, as shown 
in block 224. 

^ In block 226, the document's location identifier is com- 
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35 



a set of wildcard format rules is advantageous. A preferred V< paredloj h^ were received^ 



wildcard format or syntax is described further below. 

The process retrieves the n ext location identifie r or URL^uo 
in the t able of location identifiers of index lf>. as fiflftwn bv 
block 204. J he loop tormed by block 204 and block 208 
represents a sequential retrieval of t he location iden tifier or 
URL of each document that is inde xed in index 

The process then tests whether the current location iden- 
tifier matches any of the document specifications in the list 
that was received in block 202, as shown by block 206. For 
example, block 206 may involve matching a URL indexed 
in index 16 to the document specification 136a of list 130. 
If there is no match, then control is passed to block 208 to 
obtain the next location identifier, if any. When document 
specifications are in wildcard form, block 206 may involve 
p arsing the document specification according to one or mo re 
w ildcard fo rmat ru les or syntax rules, ~ 
If there is a match, then the process has identified a 
document indexed in index 16 that is within the scope of a 
document specification of the list obtained in block 202. In 
response, the pr ocess retrieves th e tag word that is associated 
with the matching document ^pecificalioil ifl the list. For 
example, if th e cu rrent URL of the index 1 6 matches 00 
docume^tspgc Hcation 13far, trien the process retrieves tag 
word 138^LJixu£llisJ_130r 1 jie process then determines 



wHether that tag word is currently in the word index Of in3ex 
16, as shown by block 212. If the tag word is not in the word 
index, then control is passed to block 214, in which the tag 
word is inserted into the word index by adding a new record 
to the word index. 



b lock 220. If a match occurs, then in block 228 the non- 
content tag words received in block 220 are added to the 
document's word list In block 230, the word list and 
position lists are merged into the final index. At block 232, 
completion of the process yields the complete index. 

Thus, in the embodiment of FIG. 2B, the indexing process 
is enhanced by doing the tag specification lookup at the time 
the document is converted into a word list. If any non- 
content tag words are found, then they are added to the word 
list before the list is merged into the index. 
B. WILDCARD FORMAT FOR DOCUMENT SPECIFI- 
CATIONS 

Preferably, a document specification in wildcard format 
may be expressed according to one or more of the following 
wildcard format rules. 

A wildcard may appear in the IP address portion of a 
document specification. Table 1 compares examples of 
document specifications having a wildcard designation in 
the IP portion to the scope or meaning, as interpreted by the 
process of FIG. 2, of such specifications. 

TABLE 1 

WILDCARD IN IF ADDRESS 



DOC SPECIFICATION 



SCOPE 



http;//206.19.112. 14:8000 
http;//206.1 9.112. 14 



URLs matching this host (port 8000) 
URLs matching this host on default port 80 
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WILDCARD IN IP ADDRESS 


DOC SPECIFICATION 


SCOPE 


httpy/20 6. 19.31 2.(06-9]* 


URLs matching this host on any port 


httpy/206.19.112.^ 


URLs on this subnet (port 80) 


http;//206.19.« 


URLs on this subnet (port 80) 


http^/206.- 


URLs on this subnet (port 80) 



A wildcard may appear in a hostname. Table 2 compares 
examples of document specifications having a wildcard 
designation in the hostname to the meaning, as interpreted 
by the system, of such specifications. 



TABLE 2 



WILDCARD IN HOSTNAME 



http y/www.naughty.psiweb.com :8000 

h ttp ;/ /www n f>;|g frt Y paiwuh . com :[0-9 j* 

http '7 Avww.naughty.psiweb.com 

http;//\naughty.psiweb.com 

http;//*.psiweb.com 



URLs matching this host 
(port 8000) 

URLs matching this host 
(any port) 

URLs matching this host 
(port 80) 

URLs matching this subnet 
(port 80) 

URLs matching this subnet 
(port 80) 



A wildcard may appear in a path component. If no path 
component is specified, then any URL that matches the host 
specification will be tagged. That is, the absence of a path 
component implies the wildcard designation "/*". 

The path component of the URL can be specified using 
"UNIX-style" filename wildcard patterns. Patterns can be 
formed with the following elements. 

The character matches any single character. 

The character matches any sequence of zero or more 
characters. 

The designation "[x . . . yj* matches any single character 
specified by the set (x . . . y), where any character other than 
minus sign or close bracket may appear in the set A minus 
sign may be used to indicate a range of characters. That is, 
a [0-5abc]" is a shorthand designation for "[012345abc]". 
More than one range may appear inside a character set; 
[0-9a-zA-Z.] matches almost all of the legal characters for 
a host name. 

The designation x . . . y]" matches any character not 
in the set x ... y, which is interpreted as described above for 
the designation "[x . . . y]". 

Some examples of document specifications that have 
wildcard elements in a path component include: 

http://www.crrildsafe.com/IC* 

http://www.netguide.com/part? .html 

http://www.nps.gov/fofa-z] */* 

In the preferred embodiment, except for the special wild- 
carding characters described above, the rest of the document 
specification conforms to the URI syntax in the HTTP .1.1 
specification, also called Request For Comment (RFC) 
2068, which is published at http://www.w3.org/Protocols/ 
rfc2068/rfc2068. 

C. TAGGING WITH DOCUMENT PROPERTIES 

As noted above, in an alternate embodiment, an indexing 
system may also be supplemented with non-content docu- 
ment properties. Such supplementation may be done alone 
or in combination with supplementation by non-content 
document words. 
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i ^ In this embodiment, the index contains metawords and 
v document properties. "Metawords" are words that docu- 
ments get indexed against. Documents that contain meta- 
w ords can be located in the in dex b y co mbining.Qne^or more 
5 metawor ds in a_ Boolean_query_ andsubmi ttia£-4Uto_the 
searcjijengme. "Properties", however, cannot be used to 
locate documents. Properties represent information stored in 
the index for each document, which can be returned to the 
client for display on the resultsjtaa e. after the query has been 
evaluated, properties include document Title, Abstract, and 
URL, and or other information that describes a document or 
its characteristics. 

Storage of non-content document properties is useful, for 
example, to enable a document search system to report 
descriptive information about documents in association with 
the results of a search. For example, assume that Company 
M has a team of people dedicated to evaluating Web pages 
and writing up small editorial comments on each page. An 
editorial might say, "This page is full of excellent informa- 
20 don and gets a Company M rating of 10." or alternately, 
"This page gives an adequate description of Sailing in the 
San Francisco Bay and gets a Company M rating of 6". 
Company M may want to displ ay this editorial information 
i n its results pages. Company M can achieve this by using 
25 the above-described tagging mechanism to associate the 
non-content editorial information as a document property. 
The document property information would be stored in the 
structure illustrated in FIG. 4B. 
D. PREFERRED INDEX STRUCTURE 
30 FIG. 4A, FIG. 4B, FIG. 4C, and FIG. 4D are diagrams that 
show a preferred structure of portions of index 16, which 
facilitates association of non-content information with an 
index of stored documents, and retrieval of documents based 
on non-content information. Each of the structures described 
35 below may be stored in volatile memory and periodically 
stored in non-volatile storage such as disk storage. Each 
structure may be organized as a table, file, variable, object, 
or other data structure defined by an abstract data type in a 
source program. The index 16 may also include other data 
40 structures and program elements that support word indexing, 
database lookups, and related functions. The particular 
structure of these elements is not critical. What is important 
is that the system provides a fast, efficient way to index 
hypertext documents according to non-content information, 
45 and a fast, efficient way to search the index according to a 
search query and return a set of search results. 

FIG. 4A is a diagram of a preferred embodiment of a word 
index 400 that comprises one or more records 402a-402«. 
Each record comprises a word hash value 404, an offset 
50 value 406, a word data length value 408, an adjacency index 
length value 410, and an adjacency data length value 412. 
The word hash value 404 is generated by applying a hash 
function, such as the MD5 function, to a word found in a 
document. Collectively, the offset value 406, word data 
55 length value 408, adjacency index length value 410, and 
adjacency data length value 412 provide an indirect refer- 
ence or mapping to records, in a document index, for 
documents in which the word identified by word hash value 
404 appears. 

60 In a preferred embodiment, the offset value 406, word 
data length value 408, adjacency index length value 410, and 
adjacency data length value 412 reference a word data table 
414. The word data table 414 comprises a plurality of word 
data records 414^-414/1. Each of the word data records 

65 414#-414rt comprises a word data field 416, an adjacency 
index field 417, and an adjacency data field 418. Each offset 
value 406 of a word index record 402a-4Q2n points to the 
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beginning of a record in the word data table 414. The word 
data length value 408, adjacency index length value 410, and 
adjacency data length value 412 respectively stored the 
lengths in bytes of the word data field 416, adjacency index 
field 417, and adjacency data field 418 of the record 
414a-414w to which the offset value 406 points. These 
values enable the index 16 to rapidly and efficiently retrieve 
data from the word data table 414 once a particular word of 
interest has been identified in the word index 400. Broadly 
speaking, the values enable the index 16 to know, in advance 
of reading the word data table 414, exactly where to retrieve 
needed information. 

Each word data field 416 stores a "document vector," 
which is a list of document identifiers in which the associ- 
ated word appears. Preferably, the document vector is delta- 
compressed. Collectively, an adjacency index value 417 and 
an adjacency data value 418 represent a word "position 
vector" or set of adjacency information. Each adjacency 
index value 417 stores a set of length values, in which there 
is one length value for each document in the document 
vector. Each adjacency data value 418 stores a set of 20 
variable-length position vectors. The adjacency information 
defines what words are located adjacent to a particular word, 
and is used when searching for word combinations. For 
example, if a user enters the search query "cat in the hat", the 
system uses the adjacency information to determine which 25 
index entries represent words that are adjacent to another 
word in the search query. In one embodiment, the adjacency 
information stores values indicating positions of the word 
within the document, which values can be used when 
searching for adjacent word combinations. 

FIG. 4C is a diagram of certain internal details of a 
preferred embodiment of the word data table 414. Each word 
data value 416 preferably stores a document count value 
416a, and one or more document info values 4166, 416c, 
416a*. The document count value identifies the number of 
document info values 4166, 416c, 416rf that follow. Each 
document info value summarizes information about the 
word in a document, and comprises a document identifier 
4166a and a score value 41666. The document identifier 
value 4166a uniquely identifies a document in the index 16. 
For example, a document identifier value 4166a of "44" 
represents the 44 th document in the index. 

The score value 416bb preferably is a one-byte encoded 
score. In one embodiment, the score value is used to 
determine how relevant the word is to the document that it 45 
is indexed against. For regular document content words, the 
score value represents the number of times the word appears 
in the document, normalized over the document's length. 
For example, if the word appears twenty (20) times in a 
relatively short document, the score value will be high, 
whereas if the word appears once in a long document, the 
score will be low. For tag words, the score may be a fixed 
value, such as "100". In an alternate embodiment, a user or 
customer may specify a score for each tag word. 

Preferably, the document identifier values 4166a are com- 
pressed by delta encoding and by using a variable-length 
coding of the non-zero bytes. Delta encoding involves 
computing the difference between a current document iden- 
tifier value and the previous document identifier value. The 
least significant bit of each byte of a document identifier 
value 4166a is used as a flag that indicates whether another 
byte follows. The remaining seven (7) bits of each byte 
together form an integer value, with the least significant bits 
appearing last. This structure provides a compact and effi- 
cient representation of the document identifier values. 

FIG. 4D is a diagram of the word data table 414 showing 
details of the adjacency index values 417. Each adjacency 
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index value 417 is organized in parallel to the word data 
value 416 of the same record 414a-414«, and has the same 
number of entries. Each entry is a length value 417a-417d 
and corresponds to a document info value 4166, 416c, 416d 
of a word data value 416 in the same record 414a-414/i. 
Each length value 417a-417d represents the length in bytes 
of a position vector in the adjacency data value 418 of that 
record. Preferably, each length value 417a-417d is delta- 
encoded and zero-compressed. 

This storage scheme is used to improve system perfor- 
mance. Better performance for each query is achieved, in 
part, by reducing the amount of disk I/O. The vast majority 
of queries do not require use of adjacency information; only 
phrase searches, such as "United States", require the adja- 
cency information. By separating the word data from the 
adjacency data, the system minimizes the amount of data 
that is read from disk for the typical query. The delta- 
encoded and zero-compressed values help reduce the 
amount of space they occupy, further reducing I/O. 

Each adjacency data value 418 comprises one or more 
position vector values 418a-418n. Each position vector 
value 418a-418/i stores an offset, in bytes, from the begin- 
ning of the document, in which the word that is associated 
with the current record 414a-414/i appears. Preferably, each 
position vector value is delta-encoded and zero-compressed. 

FIG. 4B is a diagram of a document index structure 420 
and a document properties structure 430 that are referenced 
by the document identifier value 4166a. The document index 
structure 420 provides an indirect mapping of the document 
identifier value 4166a into the document properties structure 
430. The document index structure 420 comprises a plurality 
of records 422a-422/z, in which each record has an offset 
value 424 and a length value 426. The document identifier 
value 4166a of the word data value 416 is equivalent to the 
relative position of records in the document index structure 
420. Thus, a document identifier value 4166a having a value 
of "4" references the fourth record 422a~422n of the docu- 
ment index structure 420. 

The document properties structure 430 comprises a plu- 
rality of record pairs 432a-432«. Each record pair comprises 
a fixed-size header 434 and a variable-size bindings section 
436. Headers 434 store static document information such as 
the time the document was crawled or scanned, format, 
document length, etc. The bindings sections 436 each store 
document property information in the form of one or more 
tag/value pairs. Tags and values are null-terminated strings. 
Tags are one- or two-character mnemonics. Values are text 
strings that represent a property value. For example, the tag 
"U" means "URL", and an associated value might be 
"http:www.inktomi.com". 

Each offset value 424 of the document index structure 420 
specifies or points to a relative offset of the document 
properties structure 430. Each length value 426 specifies the 
length in bytes of the record pair 432a-432n to which an 
associated offset value 424 points. 

Using this structure, a particular document may be rapidly 
and efficiently located based upon a word that appears in the 
document. Words are indexed in the word index 400. A 
particular word of interest is found in the word index 400 
using a search based on hash value. Information in a 
matching record of the word index record 400 points to a 
record of the word data table 414. A document identifier 
value 4166a in the record of the word data table points to the 
document index structure 420. The document index struc- 
ture 420 points to a record of the document properties 
structure 430. Values in the document properties structure 
430 specifically identify a document that contains the 
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matched word. The values can be provided to a browser, or In block 508, the search query is modified in a manner that 

the referenced document can be retrieved. excludes one or more of the tag words from the scope of the 

Other support structures and files for index 16 may be search query. For example, the browser 12 intercepts the 

provided. For example, index 16 may include a word list search query and adds the command "-ACME/BAD" to the 

("lexdata") file that contains a list of all words in the 5 que ry. The character is reflects Boolean NOT logic, 

database. Index 16 may include a database types ("dbtypes") Xmi^ the query will omit any document that is indexed 

file that contains a list of database identifiers for the index. ua der "ACME/BAD". In one embodiment, Acme provides a 

It may be used by the search engine 14 to allow searches that browser plug-in program that the user installs in association 

are restricted to a subset of cluster nodes. with the browser 12, and the plug-in program intercepts the 

A "cluster" is a group of tightly coupled workstations i0 search query and modifies it. Alternatively, browser 12 has 

used to achieve parallelism, fault-tolerance and scalability. a built-in process that modifies search queries. According to 

Each workstation in the cluster is called a "node". In a another alternative, multiple different search engines use the 

preferred embodiment, the word index is distributed across sam e search directory, but apply different filters to the search 

the nodes in a cluster of workstations. Each workstation query. This allows the search results to be tailored to a 

holds the index information for a subset of the World Wide 15 particular audience that is served by each search engine. For 

Web. The aggregate index is called a "database". example, a search engine intended for an audience of 

Index 16 also may include a deleted document list professionals in the medical field would apply a filter that 

("docid" file) that stores a list of deleted documents, repre- includes documents of interest to the medical community in 

s ented by a list of docu ment identifiers and, optionally, a the search results. According to yet another alternative, a 

lo cation identifier or U RjU-Index 16 may also have a server 2 o user may select the filters that the user wishes to use in the 

info list ("serverlnfo file) that stores a record for each query, and store the filters in a "cookie" file in association 

unique server that the indexed documents came from. Each with the browser. In this alternative, the search engine is 

record may store a server type value and server IP ad dress tailored personally to the preferences of an individual user, 

value derived from the "Server" header of an HTTP 0ther alternatives— (1) different search engines use the 

response that is received in response to a request to retrieve 25 same ^ rch directory but apply different filters. This allows 

a document from that server. Index 16 may also contain a audience-tailored search engines log a kid-safe search 

version file that stores information identifying the current origm> (2) let me ^ x]stt me ffles he wants tQ md store 

r^oStio^ 00115111116111 ' t0 CDable Pr0pef mem in a to Personally tailor the search engine. 

^ * The browser sends an HTTP request, containing the 

IV. SEARCHING BASED ON NON-CONTENT 30 modified search query, to the search engine 14, as shown by 

INFORMATION block 510. At block 512, the search engine 14 parses the 

In one mode of use, the techniques disclosed herein can search request to determine its meaning and how to query 

be used in a variety of useful document search applications. the 16 properly. At block 514, search engine 14 

FIG, 5 is a flow diagram showing a process of searching „ cxecutes a *«ch *g™* the index 16, which has been 

an index, which has been tagged in the manner described supplemented with the tag words as described above. In 

above, to provide a service that filters Web documents to ^ lock 516 ' eD S mc 14 reccives a ^ of results 

eliminate documents available over the Web that are con- from me index 16 Because of the manner in which the 

sidered undesirable for viewing or review by children. search 1 uerv *** been modifie d, all documents indexed 

In block 502, tag words to identify desirable or undesir- 40 unde ' H* ^ ™™ > " "? exclude ?/? m the 

able documents are determined. Block 502 may involve ** arch ™ lis ' n * 0< * 5 K 18 ' the ™ rch "W""* 14 * hym 

1 • j • • * * _J 1 r lL the search results to the browser 12. 
commg or deciding upon appropriate tag words for the 

service to be provided. There may be one, two, or more tag M a result > browser 12 receives a filtered set of search 

words. For example, the tag words are "ACME/BAD" and results in wmch documents deemed bad for children have 

"ACME/GOOD". The tag words indicate, respectively, that 45 been amoved. The child does not see Web documents that 

Acme Corporation has reviewed a particular page and have been Wexed under "ACME/BAD.** 

determined whether it is good or bad for children. In another embodiment, the documents contain actual or 

The index 16 is supplemented by associating documents allegedly factual information, and the tag words indicate 

in the index with one or more of the tag words, as indicated whether an impartial third party has verified the truth the 

in block 504. Assume, for purposes of this example, that all 50 information. The tag words are used to limit a Web search 

bad documents containing the word SEX have been indexed to onlv ^ os& documents that have been verified by the third 

in the index 16 in association with the tag word "ACME/ party. 

BAD." This step may be carried out using the process of m sul1 another embodiment, a tag word indicates whether 

FIG. 2. For example, Acme provides to index 16 a list of a document contains advertising. The tag word is used to 

document specifications that identify bad Web documents 55 formulate a search query that will filter out advertising from 

containing the word SEX and that are to be associated with the search results. 

tag word "ACME/BAD." There may be documents indexed Thus, embodiments of the invention are applicable to any 

in the index that contain the word SEX but are not deemed context in which a third party labels Web documents with a 

bad, and which are not indexed in association with the tag label that contains meta-information or other descriptive 

word. 60 information. 

A search query is formulated and received by the process, Another advantage is that, since the tags are indexed just 

as shown by block 506. For example, suppose a child at the as if they were words contained in the document, no special 

workstation enters the search query "SEX**. Internally, this modifications have to be made to the indexing or search 

query is represented in the format "+SEX". The char- logic in order to support non-content based filtering, 

acter reflects Boolean AND logic. Thus, the character string 65 Further, embodiments of the invention may be used to 

"+SEX" means "find documents that contain the word implement a service that can filter out, from a set of search 

SEX." results, documents that contain a particular search term but 
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are not about that search term. For example, consider a 
search query submitted by a dog aficionado that comprises 
the word "DOGS". The tag word is "AKC/DOGS." The 
American Kennel Club, which is a respected authority on 
dogs in the United States, provides a list of specifications 5 
which, in their simplest form, may be URLS, that corre- 
spond to documents that the AKC has verified to be genu- 
inely about dogs. The search query is automatically con- 
verted to a query in the form "AKC/DOGS". Thus, the 
search query retrieves only documents about dogs that have io 
been reviewed and approved by the American Kennel Club. 
Moreover, documents that contain the word "DOGS" but 
that have nothing to do with dogs are filtered out of the 
search results. Thus, the invention provides an effective 
weapon against "spamming" of an indexing system. is 

In another embodiment, a search service comprises one or 
more search nodes and one or more master nodes. Multiple 
search nodes are used to distribute search loading and to 
distribute index loading. Collectively, the search nodes and 
master node are called "back end" elements. In the preferred 20 
embodiment, one or more separate servers are "front end" 
elements. The front-end elements provide an interface to a 
browser. For example, the front-end elements may format 
the search results and may present the search results to the 
browser in a custom or specialized manner. An example of 25 
such presentation is a customized result page prepared using 
an HTML template. 

One of the master nodes accepts an HTTP connection and 
receives a search request from a browser or from another 
processor. The master node broadcasts the search request to 3° 
all nodes of the search system. Each node contains a search 
engine that can open a socket connection to the index, 
including the structures described above. Each node 
executes the search request against the index structures. 
Each node returns a set of search results to the master node. 35 
The master node merges all search results and returns the 
merged search results to the client. When the master node 
merges the search results, the master node reads search 
query and applies it to the search results, thereby adding or 
eliminating documents that are indexed under a particular 40 
tag word that appears in the search request. The merged 
search results, filtered according to the request and the tag 
word, are sent to the front-end elements. The front-end 
elements format the search results and display them at the 
browser. 45 

V. HARDWARE OVERVIEW 

FIG. 6 is a block diagram that illustrates a computer 
system 600 upon which an embodiment of the invention 
may be implemented. Computer system 600 includes a bus 50 
602 or other communication mechanism for communicating 
information, and a processor 604 coupled with bus 602 for 
processin g informatio n. Computer system 600 also includes 
a main memory 6067 such as a random access memory 
(RAM) or other dynamic storage device, coupled to bus 602 55 
f or storing informat ion and instructions to be executed b y 
pr ocessor 604. Main memory 606"also may be used for 
sforing temporary variables or other intermediate informa- 
tion during execution of instructions to be executed by 
processor 604. Computer system 600 further includes a read eo 
only memory (ROM) 608 or other static storage device 
coupled to bus 602 for storing static information and instruc- 
tions for processor 604. A storage device 610, such as a 
magnetic disk or optical disk, is provided and coupled to bus 
602 for storing information and instructions. 

Computer system 600 may be coupled via bus 602 to a 
display 612, such as a cathode ray tube (CRT), for displaying 



information to a computer user. An input device 614, includ- 
ing alphanumeric and other keys, is coupled to bus 602 for 
communicating information and command selections to 
processor 604. Another type of user input device is cursor 
control 616, such as a mouse, a trackball, or cursor direction 
keys for communicating direction information and com- 
mand selections to processor 604 and for controlling cursor 
movement on display 612. This input device typically has 
two degrees of freedom in two axes, a first axis (e.g., x) and 
a second axis (e.g., y), that allows the device to specify 
positions in a plane. 

The invention is related to the use of computer system 600 ^ 
for selecting electronic documents from among a plurality of 
such documents. According to one embodiment of the 
invention, selecting electronic is provided by computer 
system 600 in response to processo r 604 executing one nr 



mor e sequences of one or morTlnTtructions contained in 
main memory 606. S ucn instructions may be read into main 
memory 606 from another computer-readable medium, such 
as storage device 610. Execution of the sequences of instruc- 
tions contained in main memory 606 causes processor 604 
to perform the process steps described herein. In alternative 
embodiments, hard-wired circuitry may be used in place of 
or in combination with software instructions to implement 
the invention. Thus, embodiments of the invention are not 
limited to any specific combination of hardware circuitry 
and software. ■ — 

The term "computer-readable medium" as used herein 
refers to any medium that participates in providing instruc- 
tions to processor 604 for execution. Such a medium may 
take many forms, including but not limited to, non-volatile 
media, volatile media, and transmission media. Non-volatile 
media includes, for example, optical or magnetic disks, such 
as storage device 610. Volatile media includes dynamic 
memory, such as main memory 606. Transmission media 
includes coaxial cables, copper wire and fiber optics, includ- 
ing the wires that comprise bus 602. Transmission media can 
also take the form of acoustic or light waves, such as those 
generated during radio-wave and infra-red data communi- 
cations. 

Common forms of computer-readable media include, for 
example, a floppy disk, a flexible disk, hard disk, magnetic 
tape, or any other magnetic medium, a CD-ROM, any other 
optical medium, punchcards, papertape, any other physical 
medium with patterns of holes, a RAM, a PROM, and 
EPROM, a FLASH-EPROM, any other memory chip or 
cartridge, a carrier wave as described hereinafter, or any 
other medium from which a computer can read. 

Various forms of computer readable media may be 
involved in carrying one or more sequences of one or more 
instructions to processor 604 for execution. For example, the 
instructions may initially be carried on a magnetic disk of a 
remote computer. The remote computer can load the instruc- 
tions into its dynamic memory and send the instructions over 
a telephone line using a modem. A modem local to computer 
system 600 can receive the data on the telephone line and 
use an infrared transmitter to convert the data to an infrared 
signal. An infrared detector can receive the data carried in 
the infrared signal and appropriate circuitry can place the 
data on bus 602. Bus 602 carries the data to main memory 
606, from which processor 604 retrieves and executes the 
instructions. The instructions received by main memory 606 
may optionally be stored on storage device 610 either before 
or after execution by processor 604. 

Computer system 600 also includes a communication 
interface 618 coupled to bus 602. Communication interface 
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618 provides a two-way data communication coupling to a 
network link 620 that is connected to a local network 622. 
For example, communication interface 618 may be an 
integrated services digital network (ISDN) card or a modem 
to provide a data communication connection to a corre- 5 
sponding type of telephone line. As another example, com- 
munication interface 618 may be a local area network 
(LAN) card to provide a data communication connection to 
a compatible LAN. Wireless links may also be implemented. 
In any such implementation, communication interface 618 10 
sends and receives electrical, electromagnetic or optical 
signals that carry digital data streams representing various 
types of information. 

Network link 620 typically provides data communication 
through one or more networks to other data devices. For *5 
example, network link 620 may provide a connection 
through local network 622 to a host computer 624 or to data 
equipment operated by an Internet Service Provider (ISP) 
626. ISP 626 in turn provides data communication services 
through the worldwide packet data communication network 20 
now commonly referred to as the "Internet" 628. Local 
network 622 and Internet 628 both use electrical, electro- 
magnetic or optical signals that carry digital data streams. 
The signals through the various networks and the signals on 
network link 620 and through communication interface 618, 25 
which carry the digital data to and from computer system 
600, are exemplary forms of carrier waves transporting the 
information. 

Computer system 600 can send messages and receive 
data, including program code, through the network(s), net- 30 
work link 620 and communication interface 618. In the 
Internet example, a server 630 might transmit a requested 
code for an application program through Internet 628, ISP 
626, local network 622 and communication interface 618. In 
accordance with the invention, one such downloaded appli- 35 
cation provides for selecting electronic documents as 
described herein. 

Processor 604 may execute the received code as it is 
received, and/or stored in storage device 610, or other 
non-volatile storage for later execution. In this manner, 40 
computer system 600 may obtain application code in the 
form of a carrier wave. 

EXTENSIONS AND ALTERNATIVES 

In the foregoing specification, the invention has been 45 
described with reference to specific embodiments thereof. It 
will, however, be evident that various modifications and 
changes may be made thereto without departing from the 
broader spirit and scope of the invention. The specification 
and drawings are, accordingly, to be regarded in an illus- 50 
trative rather than a restrictive sense. 

What is claimed is: 

1. A method of selecting electronic documents from 
among a plurality of electronic documents, the method 
comprising the steps of: 55 
storing a tag word in an index in association with infor- 
mation identifying an electronic document, in which 
the tag word comprises data that does not appear in a 
content of the electronic document; 6Q 
receiving a search query; 

modifying the search query to create a modified search 
query by adding to the search query a search term that 
references the tag word; and 

creating a set of search results by searching the index 65 
based on the modified search query; 

wherein the step of storing includes the steps of: 
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receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the 
one or more tag words, and in which at least a portion 
of the data is expressed in a wildcard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the criteria; 
and 

when one location identifier matches one of the criteria, 
storing, in the index, information associating such 
location identifier with one or more of the tag words. 

2. A method of selecting electronic documents from 
among a plurality of electronic documents, the method 
comprising the steps of: 

storing a tag word in an index in association with infor- 
mation identifying an electronic document, in which 
the tag word comprises data that does not appear in a 
content of the electronic document; 

receiving a search query; 

modifying the search query to create a modified search 

query by adding to the search query a search term that 

references the tag word; and 
creating a set of search results by searching the index 

based on the modified search query; 
wherein the step of storing includes the steps of: 

receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with one or more tag 
words, and in which one of the specifications is 
expressed in a wildcard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the speci- 
fications by interpreting the one of the specifications 
that is in the wildcard format according to one or 
more wildcard format rules; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with one or more 
of the tag words. 

3. A method of processing queries that select an electronic 
document from among a plurality of documents, the method 
comprising the steps of: 

storing a tag word in an index in association with infor- 
mation identifying the electronic document, in which 
the tag word indicates that access to the electronic 
document is restricted; 

receiving a search query that requests the electronic 
document; 

modifying the search query to create a modified search 
query by adding a search term that references the tag 
word; and 

creating a set of search results by searching the index 

based on the modified search query; 
wherein the step of storing further includes the steps of: 

receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with the tag word, 
and in which each of the specifications is expressed 
in a wildcard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the speci- 
fications; and 



04/14/2003, EAST Version: 1.03.0007 



US 6,3 

21 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with the tag 
word. 

4. A method of processing queries that select an electronic 
document from among a plurality of documents, the method 
comprising the steps of: 

storing a tag word in an index in association with infor- 
mation identifying the electronic document, in which 
the tag word indicates that access to the electronic 
document is restricted; 

receiving a search query that requests the electronic 
document; 

modifying the search query to create a modified search 
query by adding a search term that references the tag 
word; and 

creating a set of search results by searching the index 

based on the modified search query; 
wherein the step of storing includes the steps of: 

receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with the tag word, 
and in which one of the specifications is expressed in 
a wildcard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the speci- 
fications by interpreting the one of the specifications 
that is in the wildcard format according to one or 
more wildcard format rules; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with the tag 
words. 

5. A method of processing queries that select an electronic 
document from among a plurality of documents, the method 
comprising the steps of: 

storing a tag word in an index in association with infor- 
mation identifying the electronic document, in which 
the tag word indicates that access to the electronic 
document is restricted; 

receiving a search query that requests the electronic 
document; 

modifying the search query to create a modified search 
query by adding a search term that references the tag 
word; and 

creating a set of search results by searching the index 

based on the modified search query; . 
wherein the step of storing includes the steps of: 

receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the 
one or more tag words, and in which at least a portion 
of the data is expressed in a wildeard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the criteria; 
and 

when one location identifier matches one of the criteria, 
storing, in the index, information associating such 
location identifier with one or more of the tag words. 

6. A method of constructing an index of a plurality of 
electronic documents for use in selecting electronic docu- 
ments from among the plurality of electronic documents, the 
method comprising the steps of: 
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receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the one 
or more tag words, wherein the tag words do not appear 
5 in a content of the electronic documents; 

storing a list of words that are within one document of the 
plurality of documents; 

storing, in the index, information associating each of the 
one or more tag words with the one document when the 
one document satisfies the criteria associated with the 
tag words; 

wherein the step of receiving data includes the steps of 
receiving data that indicates one or more tag words and 
1S criteria to be used to determine which of the plurality 
of documents should be associated with each of the one 
or more tag words, and in which at least a portion of the 
data is expressed in a wildcard format; and 
wherein the step of storing information comprises the 
20 steps of retrieving a location identifier of each of the 
documents; matching each location identifier to each of 
the criteria; and when one location identifier matches 
one of the criteria, storing, in the index, information 
associating such location identifier with one or more of 
25 the tag words. 

7. A method of constructing an index of a plurality of 
electronic documents for use in selecting electronic docu- 
ments from among the plurality of electronic documents, the 
method comprising the steps of: 
30 receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the one 
or more tag words, wherein the tag words do not appear 
in a content of the electronic documents; 
35 storing a list of words that are within one document of the 
plurality of documents; 
storing, in the index, information associating each of the 
one or more tag words with the one document when the 
one document satisfies the criteria associated with the 
tag words; 

wherein the step of receiving data includes the steps of 
receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
45 the specifications is associated with one or more tag 
words, and in which one of the specifications is 
expressed in a wildcard format; 

and wherein the step of storing information comprises the 
steps of: 

50 retrieving a location identifier of each of the documents 
that are indexed in the index; 
matching each location identifier to each of the speci- 
fications by interpreting the one of the specifications 
that is in the wildeard format according to one or 
55 more wildcard format rules; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with one or more 
of the tag words, 
go 8. A method of constructing an index of a plurality of 
electronic documents for use in selecting electronic docu- 
ments from among the plurality of electronic documents, the 
method comprising the steps of: 
receiving data that indicates one or more document prop- 
65 erty values and criteria to be used to determine which 
of the plurality of documents should be associated with 
each of the one or more document property values, 
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wherein the document property values do not appear in 
a content of the electronic documents; 

storing a list of words that are within one document of the 
plurality of documents; and 

storing, in the index, information associating each of the 5 
one or more document property values with the one 
document when the one document satisfies the criteria 
associated with the document property values. 

9. A method of selecting electronic documents from 
among a plurality of electronic documents, the method 10 
comprising the steps of: 

storing a document property value in an index in asso- 
ciation with information identifying an electronic 
document, in which the document property value com- 1S 
prises data that does not appear in a content of the 
electronic document; 

receiving a search query; 

modifying the search query to create a modified search 
query by adding to the search query a search term that 20 
references the document property value; and 

creating a set of search results by searching the index 
based on the modified search query. 

10. Acomputer-readable medium carrying instructions for 
selecting electronic documents from among a plurality of 25 
electronic documents, the computer-readable medium com- 
prising instructions for performing the steps of: 

storing a tag word in an index in association with infor- 
mation identifying an electronic document, in which 
the tag word comprises data that does not appear in a 30 
content of the electronic document; 

receiving a search query; 

modifying, the search query to create a modified search 

query by adding to the search query a search term that 35 

references the tag word; and 
creating a set of search results by searching the index 

based on the modified search query; 
wherein the step of storing includes the steps of: 

receiving data that indicates one or more tag words and 40 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the 
one or more tag words, and in which at least a portion 
of the data is expressed in a wildcard format; 

retrieving a location identifier of each of the documents 45 
that are indexed in the index; 

matching each location identifier to each of the criteria; 
and 

when one location identifier matches one of the criteria, 
storing, in the index, information associating such 50 
location identifier with one or more of the tag words. 

11. Acomputer-readable medium carrying instructions for 
selecting electronic documents from among a plurality of 
electronic documents, the computer-readable medium com- 
prising instructions for performing the steps of: 55 

storing a tag word in an index in association with infor- 
mation identifying an electronic document, in which 
the tag word comprises data that does not appear in a 
content of the electronic document; 

... 60 

receiving a search query; 

modifying the search query to create a modified search 
query by adding to the search query a search term that 
references the tag word; and 

creating a set of search results by searching the index 65 
based on the modified search query; 

wherein the step of storing includes the steps of 
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receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with one or more tag 
words, and in which one of the specifications is 
expressed in a wildcard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the speci- 
fications by interpreting the one of the specifications 
that is in the wildcard format according to one or 
more wildcard format rules; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with one or more 
of the tag words. 

12. Acomputer-readable medium carrying instructions for 
processing queries that select an electronic document from 
among a plurality of documents, the computer-readable 
medium carrying instructions for performing the steps of: 

storing a tag word in an index in association with infor- 
mation identifying the electronic document, in which 
the tag word indicates that access to the electronic 
document is restricted; 

receiving a search query that requests the electronic 
document; 

modifying the search query to create a modified search 
query by adding a search term that references the tag 
word; and 

creating a set of search results by searching the index 

based on the modified search query; 
wherein the step of storing further includes the steps of: 
receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with the tag word, 
and in which each of the specifications is expressed 
in a wildcard format; 
retrieving a location identifier of each of the documents 

that are indexed in the index; 
matching each location identifier to each of the speci- 
fications; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with the tag 
word. 

13. Acomputer-readable medium carrying instructions for 
processing queries that select an electronic document from 
among a plurality of documents, the computer-readable 
medium comprising instructions for performing the steps of: 

storing a tag word in an index in association with infor- 
mation identifying the electronic document, in which 
the tag word indicates that access to the electronic 
document is restricted; 

receiving a search query that requests the electronic 
document; 

modifying the search query to create a modified search 
query by adding a search term that references the tag 
word; and 

creating a set of search results by searching the index 

based on the modified search query; 
wherein the step of storing includes the steps of: 

receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with the tag word, 
and in which one of the specifications is expressed in 
a wildcard format; 
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retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the speci- 
fications by interpreting the one of the specifications 
that is in the wildcard format according to one or 5 
more wildcard format rules; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with the tag 
words. 10 

14. A computer-readable medium carrying instruction for 
processing queries that select an electronic document from 
among a plurality of documents, the computer-readable 
medium comprising instructions for performing the steps of: 

storing a tag word in an index in association with infor- 15 
mation identifying the electronic document, in which 
the tag word indicates that access to the electronic 
document is restricted; 

receiving a search query that requests the electronic 
document; 20 

modifying the search query to create a modified search 
query by adding a search term that references the tag 
word; and 

creating a set of search results by searching the index 

based on the modified search query; 25 
wherein the step of storing includes the steps of: 

receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the 
one or more tag words, and in which at least a portion 30 
of the data is expressed in a wildcard format; 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the criteria; 
and 35 

when one location identifier matches one of the criteria, 
storing, in the index, information associating such 
location identifier with one or more of the tag words. 

15. Acomputer-readable medium carrying instructions for 
constructing an index of a plurality of electronic documents 40 
for use in selecting electronic documents from among the 
plurality of electronic documents, the computer-readable 
medium comprising instructions for performing the steps of: 

receiving data that indicates one or more tag words and 45 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the one 
or more tag words, wherein the tag words do not appear 
in a content of the electronic documents; 

storing a list of words that are within one document of the 
plurality of documents; 

storing, in the index, information associating each of the 
one or more tag words with the one document when the 
one document satisfies the criteria associated with the 
tag words; 

wherein the step of receiving data includes the steps of 
receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the one 
or more tag words, and in which at least a portion of the 
data is expressed in a wildcard format; 60 

and wherein the step of storing information comprises the 
steps of retrieving a location identifier of each of the 
documents; matching each location identifier to each of 
the criteria; and when one location identifier matches 
one of the criteria, storing, in the index, information 65 
associating such location identifier with one or more of 
the tag words. 
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16. Acomputer-readable medium carrying instructions for 
constructing an index of a plurality of electronic documents 
for use in selecting electronic documents from among the 
plurality of electronic documents, the computer-readable 
medium carrying instructions for performing the steps of: 

receiving data that indicates one or more tag words and 
criteria to be used to determine which of the plurality 
of documents should be associated with each of the one 
or more tag words, wherein the tag words do not appear 
in a content of the electronic documents; 

storing a list of words that are within one document of the 
plurality of documents; 

storing, in the index, information associating each of the 
one or more tag words with the one document when the 
one document satisfies the criteria associated with the 
tag words; 

wherein the step of receiving data includes the steps of 
receiving specifications of one or more of the docu- 
ments that are indexed in the index, in which each of 
the specifications is associated with one or more tag 
words, and in which one of the specifications is 
expressed in a wildcard format; 

and wherein the step of storing information comprises the 
steps of: 

retrieving a location identifier of each of the documents 
that are indexed in the index; 

matching each location identifier to each of the speci- 
fications by interpreting the one of the specifications 
that is in the wildcard format according to one or 
more wildcard format rules; and 

when one location identifier matches one of the 
specifications, storing, in the index, information 
associating such location identifier with one or more 
of the tag words. 

17. Acomputer-readable medium carrying instructions for 
constructing an index of a plurality of electronic documents 
for use in selecting electronic document from among the 
plurality of electronic documents, the computer-readable 
medium comprising instructions for performing the steps of: 

receiving data that indicates one or more document prop- 
erty values and criteria to be used to determine which 
of the plurality of documents should be associated with 
each of the one or more document property values, 
wherein the document property values do not appear in 
a content of the electronic documents; 

storing a list of words that are within one document of the 
plurality of documents; and 

storing, in the index, information associating each of the 
one or more document property values with the one 
document when the one document satisfies the criteria 
associated with the document property values. 

18. Acomputer-readable medium carrying instructions for 
selecting electronic documents from among a plurality of 
electronic documents, the computer-readable medium com- 
prising instructions for performing the steps of: 

storing a document property value in an index in asso- 
ciation with information identifying an electronic 
document, in which the document property value com- 
prises data that does not appear in a content of the 
electronic document; 

receiving a search query; 

modifying the search query to create a modified search 
query by adding to the search query a search term that 
references the document property value; and 

creating a set of search results by searching the index 
based on the modified search query. 
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Public Sub 
Main 

(Starts Program) 



Private Sub 
ProcessCommandLine 

(Parse the 
command line for 

meeting text) 



I 



-200 



Private Function 
CreateStopLlst 

(Prepares 
global stop list) 



Public Sub 
CreatePatterns 
(Prepares all the 

patterns for 
pattern match) 



230 J 



Private Sub 
GoBackgroundFinder 
(Wrapper Function) 



220 



10 Public Function 

250-H ParseMeetingTest 

(Extracts keywords from meeting record) 



260— 



270- 



275— 



280— 



290— 



Public Function 
GoPatternMatch 
(Initiates pattern matching) 



Public Function 
SearchAltaVista 
(Parse Results) 



Public Function 
SearchNewsPage 
(Query and Parse Results) 



Private Function 
ConstructOverallResult 
(Prepares data) 



Public Sub 
ConnectAndTransferToMunin 
(Sends data to Munin) 



240 



Fig. 2 



295 



Built-in Function 
Winsock.SendData 
(Sends data through UDP) 



Public Sub 
DisconnectFromMuninAndQuit 
(Once data is sent, clean program and exit) 

297^ 
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>io 



COMMAND LINE 

'/user_ld, meeting title, meeting body, list, location, time 



MESSAGE 

user_id, meeting title, meeting body, participant list, time 



620 | 

MEETING RECORD TO STORE CURRENT MEETING INFORMATION 



>io 



stUSERID, sTitleOrlg, sTitleKW, sBodyKW, sLocatlon, sTime, 
sParticipants( ), sMeeting Text: original message minus user_fd 
sCompany, sPeople, sTopic, sWhen, sWhere from Go Pattern Match 



ft 
640 



SUBMIT QUERY TO ALTA VISTA 



>60 



SUBMIT QUERY TO NEWSPAGE 



STORE MESSAGE IN gResultOverall 
msg_id, user_id, meeting title concatenated with stories' 



PROCESS STORIES FROM ALTA VISTA AND NEWSPAGE 



670 



Fig. 6 
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A MEETING RECORD - POTENTIAL COMPANIES, PEOPLE, TOPICS, 

n LOCATION AND A TIME ARE IDENTIFIED 

710 ' 

7^CT 



AT LEAST ONE TOPIC IS IDENTIFIED 



7^o . 



AT LEAST ONE COMPANY NAME IS IDENTIFIED 



7io " 



A DECISION IS MADE ON WHAT MATERIAL TO TRANSMIT 



Fig. 7 



_ A MEETING RECORD - POTENTIAL COMPANIES, PEOPLE, TOPICS, 
n LOCATION AND A TIME ARE IDENTIFIED 

810 





AT LEAST ONE COMPANY NAME IS IDENTIFIED 



8 



AT LEAST ONE TOPIC IS IDENTIFIED 



I 



840 



USET THE TOPIC AND OR THE COMPANY 



Fig. 8 
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Fig. 9 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 10 of 29 US 6,356,905 Bl 




04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet ll of 29 US 6,356,905 Bl 



1003 



s 



User Profile 
Database 



1006— 



1007- 



1008— 



User Requests 
Content Page 




Get User 
Preferences 



Get Page 
Content 



Get User- 
Centric 
Content 



Create Page 
Using Layout 
Preferences 



Display Page 
to User 



1001 



1002 



A 004 



r 1 



005 



Content 
Database 



( END ) 



Fig. 10B 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 12 of 29 US 6,356,905 Bl 

Retrieve User-Centric Content 



1122- 



V 




Parse 






Content for 






Times, Dates, 






Contacts 





1110 



Get User- 
Centric 
Content 

— L3Z 



1111 



1112- 



Get Matching 
Calendar 
Items 



1115^ 



Get Matching 
Email Items 



1117- 



*1_ 



Get Matching 
Contact Items 



1119- 



Get Matching 
Task List 
Items 



1121 



Get Matching 
News Items 



J 



Return 
Content 



Content 
Database 



Ml 13 



Email 
Database 



Gil u 



Contact 
Database 



Ml 16 



Task List 
Database 



Ml 18 



News 
Database 



M120 



Fig. 11 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 13 of 29 US 6,356,905 Bl 




o o 
_c a> 

£3- 

a> o 
a> Q. 



r-5 

-?3r 



■§5 
■so 



o 



a> o 
E * 
o .£ 
n: o 



11 



CD o 

o 3^ 
z o 

• 

£ CL 



fO U 
E-|> 

5 P 



o 
o 

E = 

o t. 

a> v — 



O 
CO 
CM 

r- 



c o 
o ~j» 

2 2 

CD OL 
O 



c o 



o 



a> o 
E£> 
£ o 



o 
E 



o 
o 

CM 

cr 



CD o 

E". 
o _o 
rc o 

_a> 

CD 

2 2 



" CD 
^ O 



CD 

E^ 

O <D 

^_ 3* 
c o 

CD 

E = 
o E 

(tt v — ^ 



O 
lO 
CM 




CD 
0< 




J* O 
iZ CD 

^ o 

11 

UJ a. 



o 

00 
CM 



o 
ir cd 
£3* 

0 

CD 

<D 7= 

si 

I— CL 



^ 0 

CD 
CO — 

I 2 

Z Q. 



C\2 



■i— i 
En 



CD 

== <D 

O CD 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent 



Mar. 12, 2002 



Sheet 14 of 29 



US 6,356,905 Bl 



o 



o 
o 

o 
c 
o 

CO 

k- 

0> 

a. 



o 



i 

S-e! 
% I 

i 



cp 



O 



I I 

'5' 

CD 1^1 

grci 

i ®i 
I l 



A A 



2 

CD 



2 
CD 



mySite! 




User 




Persona 




— » 


— 3» 


— » 



o 

?0 



o 






C 




.2 


Will 


Irid 


Q- 


CO 
CD 




a: 



CO 

■i-H 
fa 



— X A 



35 
a> 



O o 

oo 

to K) 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 15 of 29 



US 6,356,905 Bl 



a> 
o 



o 
O 



o 




bX) 



Generic 
Intention 


^ 


Intention 
Area 




Intention 
Page 




^» 



2 
a> 



V V V 



I 1 I 

I C |_|| 

i oictri 
Ere i w i 

StOirjl 

|Q|CO, 
I I I 



a> 



z lg|0 

Pi 
I i 



V V V V 



E 
o 

CO 

o 

—J 



CM 



I I 



I ail 



i! - 



grci^i Ei 

z iwi i-jzi 
i i i u i 

L _L L__l_ I 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 16 of 29 



US 6,356,905 Bl 



( START ) 



User Requests 
Agent Statistics 
Page 



15^0 



User Profile 
Database 



1550- 



1570- 




1510 



Get User 
Statistics 



Normalize 
Statistics 



1520 
1530 



Get Statistics 
Formulas 



Generate 
Graphs with 
Statistics 



560 



Content 
Database 



Create 
Statistics 
Page 



1580 



Return 
Statistics 
Page to User 



1590 



Fig. 15 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 17 of 29 



US 6,356,905 Bl 



User Requests 
Product Report 
^about Product X 




1610 



16^0 



User Profile _[ 
Database 



1640- 



Return 
threshold 
variables 



680 



1690- 



Fig. 16 



Get User 
Profiles of 
Users Who 
Have Rated 
Product X 



-1620 



Get Profile 
Matching 
Algorithm 

Thresholds 



Map Users 
According to 
Profile 
Matching 
Algorithm 




-1660 




1670 



Calculate 
statistics from 
n nearest 
neighbors 
(high, low, 
avg.) for 
features 



1695 



Insert 
statistics into 
product report 
template 



Return 
product report 



to user 



1697 



( END ) 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 18 of 29 US 6,356,905 Bl 




Fig. 17 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 19 of 29 



US 6,356,905 Bl 



o 

00' 



a> 



to CO 

o 



8 ° 

£ e 8 i 

Q-OaJE 



• • • 



O 
CD 
00 

2 



a> 



<o JE2 £2 
<2 t/>~ 



a> 2 *2 c/> q a> 

o > o q--S a. 
c a> a> 3 




CO 

■rH 
[*4 



a> 
o 
a> c 



-2,2 © w c 

3 C l1 O O 
O0-0(/)Q£ 



C 

o ^_ 

- o ^ 

o — — 



E wJ2 g S 
a.-Eo-tn«<{/> 



— o 
£ 2 




04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 20 of 29 



US 6,356,905 Bl 



1930 



s 



User Profile 
Database 



1940. 



1960- 



1970- 



1980- 



1995- 



1997- 



(jtarT) 



User Requests gQQ 
Summary Page^ 




Get User 
Agent 
Preferences 



J 920 



Get Content 



Summarize 
Content 



Create Page 
Using Layout 
Preferences 



Generate 
Agent 
Speech Text 



Insert Agent 
Speech Text 



Display Page 
to User 



Fig. 19 



950 



Content 
Database 



990 



Content 
Database 



04/14/2003, 



EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 21 of 29 US 6,356,905 Bl 




04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 22 of 29 US 6,356,905 Bl 



ooooooooo®o 



® ® ® 



C/> CO c/> 

o o o 

® ® ® 



CO CO to 

< «t <c 



o o o 

* * j* 
< < <c 

® ® ® 

CO 10 CO 

<t <c <c 



CO 

>* 
o o 

< << 
o ® 

CO CO 



o o o o o o 



0^ 0^ 4^ 0^ 

>>>>>>>>>> 

CD CO CO CO CD CD CD CD CO CO 



























O 












o 












Cfc: 




















CO 




CO 
CT 












O 


O 


-2 




5/5 




Q. 


o 


< 










ro 
















CO 


o 


CD 








ro 


CD 


Q 


o 




*c 










o 







D 

O 



CO 




^ o 



O 



CD 

-2 C 

O 
CO 



CM 

■i-H 



04/14/2003, EAST Version: 1.03.0007 



U.S, Patent Mar. 12, 2002 Sheet 23 of 29 US 6,356,905 Bl 



i 



2 

o 

a 
x 

s 



o 
co 

2 

o 



E 



> 
O 



55 



Si 

_a> 



v> 

J* 

c 



a> 
c 
c 
o 

o 
o 



©jo 



o 

00 



«i 



CO 

co 

co 
q: 

a 

A P 



in 

•? 

t 2 



o 
co 








BB 



CO 

c 

"o 
<x> 

s 

-o 

<p 



o 
a 

x 



o 
o 



O 



CM 



-8 Sf 



if 
o 
a 
a> 

Ml v»/ 



UJ 



BBS 



CO 




BB 



co 
E 

o 
o 



a 

e 

o 

CO 
CO 



e 

CO 

E 

CO 



D 

a> 

CC 



o 

N 



CO 

Q 
O 



XI 

o 

CO 



o 
> 

£ 
o 

CO 



o 
a 



i 



a> 
a> 
a 



s 



•rH 



CO 



o 



CO 

o 



CO — 



CO 



a 

CO 



CO 



46 uj 



o 

— o -c ^ — § 

g c g w o e 

O CO «> L CD CO 

_1 O- 3 ■< a: q. 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 24 of 29 US 6,356,905 Bl 




_ <t> 
B«2 <» 
£ C «= 5 

_ a> o ~ 
p £- o.g 



<3> 

o 
c 
o 

E 



— o 



o 



CO 



a> 

-Q 



a> -° a) o 



a> a> 
to 2. 
co <x> 



o>2 

O O 
C -C 
CP CO 



O 

CD 

■o 



a § 3 

O TJ >, 



o 

a> - 



<p q> 2 

cog 
° £ 5 

r -S o 



Q> 

c 
o 



o 



3 
CO 




(spUDSnoqi Uj) SJD||0Q 



CD 
C7J 

c 
a 



m 
to 

CM 



o 



D 
O 
O 



0 



O 

oo 

tN 




CD c 

O C 

o o> 

to o 



D 

s - i 

i s 1 

£ O o 

L- •< 3 



Of 



o 



04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 25 of 29 



US 6,356,905 Bl 




04/14/2003, EAST Version: 1.03.0007 




04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 27 of 29 US 6,356,905 Bl 




04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 28 of 29 US 6,356,905 Bl 




04/14/2003, EAST Version: 1.03.0007 



U.S. Patent Mar. 12, 2002 Sheet 29 of 29 US 6,356,905 Bl 




04/14/2003, EAST Version: 1.03.0007 



US 6,3 

1 

SYSTEM, METHOD AND ARTICLE OF 
MANUFACTURE FOR MOBILE 
COMMUNICATION UTILIZING AN 
INTERFACE SUPPORT FRAMEWORK 

FIELD OF THE INVENTION 

The present invention relates to agent based systems and 
more particularly to a mobile computing environment that 
accesses the Internet to obtain product information for a user 
through an interface support framework. 

BACKGROUND OF THE INVENTION 

Computer assistance in all environments is increasingly 
necessary as computer technology becomes increasingly 
embedded in society. Mobile computing technology 
addresses this issue by allowing the individual to access 
computer related information at all times and in all envi- 
ronments. 

One of the first major advances in mobile computer 
technology was the Personal Digital Assistant (PDA). A 
PDA allowed a user to access computer related information, 
yet fitted in the palm of the hand. Utilizing a PDA the user 
could organize personal affairs, write notes, calculate 
equations, and record contact numbers an address book. In 
addition, PDAs were usually capable of interfacing with a 
desktop computer, typically through a wire connection. The 
connection allowed the PDA to download information and 
upload information, with the desktop computer. Later devel- 
opments gave the PDA wireless capabilities. The wireless 
capabilities allowed the PDA to interact with other comput- 
ers that were not physically connected to the PDA. 

Wireless PDAs could communicate with computers that 
were connected to the World Wide Web, and soon led to 
PDAs capable of Web browsing. One of the first companies 
to develop Web browsing capabilities for PDAs was Inter- 
com. 

Intercom's Falcon Mobile Server allowed PDAs with 
Web functions to directly connect to a host computer. Just by 
installing the software onto the host server, PDA terminals 
were able to access information through the World Wide 
Web. 

Currently, more integration in mobile computing is 
desired. Nokia, an Irving Tex. company, has partially 
addressed the integration issue by developing the Nokia 
9000 wireless voice phone. The Nokia 9000 includes a small 
keyboard, a specialized Web browser from microbrowser 
vendor Unwired Planet, Inc., and a small VGA monitor. 
Nokia worked with Ericsson Inc, Motorola Inc. and Unwired 
Planet to establish the Wireless Application Protocal (WAP), 
a standardized browser technology and server format. WAP 
gave manufacturers a standard way to put data capability 
into wireless phones, and allowed carriers to do more 
over-the-air management. For example, if a carrier wanted a 
field trial of a new data service, the carrier could implement 
the service on a server, deliver it to a phone through the 
microbrowser and adjust the service if they found the service 
unsatisfactory. 

Prior Art FIG. 1A is a diagram of prior art mobile 
computing solutions based on web portal networks. In the 
Prior Art, the user 10 must deal separately with each 
participant of the network. In the Prior Art mobile comput- 
ing solution, the user 10 utilizes an Internet service provider 
(ISP) 12 to gain access to a web portal 14. The web portal 
14 accesses third party services 16 which provide informa- 
tion directly to the user 10. However, in addition to dealing 
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with the Internet Service Provider 12, the user 10 must 
purchase the wireless device from the device manufactures 
or retailers 18. In most cases the user 10 would also have to 
purchase the browser from the browser provider 20. 

5 Generally, the user would have to pay the wireless commu- 
nication cost, leading to the user needing to deal with the 
phone company 22. And finally, any web purchases would 
lead to the user 10 needing to deal with the credit card 
company 24. It is obvious that a coordinated and packaged 

10 service would be an ideal mobile computing solution. 
Furthermore, a coordinated and packaged service which 
made use of agents would be highly desired. 

Agent based technology has become increasingly impor- 
tant for use with applications designed to interact with a user 

15 for performing various computer based tasks in foreground 
and background modes. Agent software comprises computer 
programs that are set on behalf of users to perform routine, 
tedious and time-consuming tasks. To be useful to an 
individual user, an agent must be personalized to the indi- 

20 vidual user's goals, habits and preferences. Thus, there 
exists a substantial requirement for the agent to efficiently 
and effectively acquire user-specific knowledge from the 
user and utilize it to perform tasks on behalf of the user. 
The concept of agency, or the user of agents, is well 

25 established. An agent is a person authorized by another 
person, typically referred to as a principal, to act on behalf 
of the principal. In this manner the principal empowers the 
agent to perform any of the tasks that the principal is 
unwilling or unable to perform. For example, an insurance 
agent may handle all of the insurance requirements for a 
principal, or a talent agent may act on behalf of a performer 
to arrange concert dates. 

With the advent of the computer, a new domain for 

35 employing agents has arrived. Significant advances in the 
realm of expert systems enable computer programs to act on 
behalf of computer users to perform routine, tedious and 
other time-consuming tasks. These computer programs are 
referred to as "software agents." 

40 Moreover, there has been a recent proliferation of com- 
puter and communication networks. These networks permit 
a user to access vast amounts of information and services 
without, essentially, any geographical boundaries. Thus, a 
software agent has a rich environment to perform a large 

45 number of tasks on behalf of a user. For example, it is now 
possible for an agent to make an airline reservation, pur- 
chase the ticket, and have the ticket delivered directly to a 
user. Similarly, an agent could scan the Internet and obtain 
information ranging from the latest sports or news to a 

50 particular graduate thesis in applied physics. Current solu- 
tions fail to apply agent technology to provide targeted 
acquisition of information for a user's upcoming events. 

SUMMARY OF THE INVENTION 

55 A system is disclosed that facilitates web-based informa- 
tion retrieval and display system. Awireless phone or similar 
hand-held wireless device with Internet Protocol capability 
is combined with other peripherals to provide a portable 
portal into the Internet. TTie wireless device prompts a user 

60 to input information of interest to the user. This information 
is transmitted a query to a service routine (running on a Web 
server). The service routine then queries the Web to find 
price, shipping and availability information from various 
Web suppliers. This information is formatted and displayed 

65 on the hand-held device's screen through an interface sup- 
port framework. The user may then use the hand-held device 
to place an order interactively. 
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DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects, aspects and advantages 
are better understood from the following detailed description 
of a preferred embodiment of the invention with reference to 
the drawings, in which: 5 

Prior Art FIG. 1A is a diagram of Prior Art mobile 
computing solutions based on web portal networks; 

FIG. 1 is a block diagram of a representative hardware 
environment in accordance with a preferred embodiment; 1Q 

FIG. 2 is a flowchart of the system in accordance with a 
preferred embodiment; 

FIG. 3 is a flowchart of a parsing unit of the system in 
accordance with a preferred embodiment; 

FIG. 4 is a flowchart for pattern matching in accordance 15 
with a preferred embodiment; 

FIG. 5 is a flowchart for a search unit in accordance with 
a preferred embodiment; 

FIG. 6 is a flowchart for overall system processing in 
accordance with a preferred embodiment; 20 

FIG. 7 is a flowchart of topic processing in accordance 
with a preferred embodiment; 

FIG. 8 is a flowchart of meeting record processing in 
accordance with a preferred embodiment; 

FIG. 9 is a block diagram of process flow of a pocket 25 
bargain finder in accordance with a preferred embodiment; 

FIGS. 10A and 10B are a block diagram and flowchart 
depicting the logic associated with creating a customized 
content web page in accordance with a preferred embodi- 3Q 
ment; 

FIG. 11 is a flowchart depicting the detailed logic asso- 
ciated with retrieving user-centric content in accordance 
with a preferred embodiment; 

FIG. 12 is a data model of a user profile in accordance 35 
with a preferred embodiment; 

FIG. 13 is a persona data model in accordance with a 
preferred embodiment; 

FIG. 14 is an intention data model in accordance with a 
preferred embodiment; 40 

FIG. IS is a flowchart of the processing for generating an 
agent's current statistics in accordance with a preferred 
embodiment; 

FIG. 16 is a flowchart of the logic that determines the 
personalized product rating for a user in accordance with a 45 
preferred embodiment; 

FIG. 17 is a flowchart of the logic for accessing the 
centrally stored profile in accordance with a preferred 
embodiment; 

FIG. 18 is a flowchart of the interaction logic between a 50 
user and the integrator for a particular supplier in accordance 
with a preferred embodiment; 

FIG. 19 is a flowchart of the agent processing for gener- 
ating a verbal summary in accordance with a preferred 
embodiment; 55 

FIG. 20 illustrates a display login in accordance with a 
preferred embodiment; 

FIG. 21 illustrates a managing daily logistics display in 
accordance with a preferred embodiment; 60 

FIG. 22 illustrates a user main display in accordance with 
a preferred embodiment; 

FIG. 23 illustrates an agent interaction display in accor- 
dance with a preferred embodiment; 

FIG. 24 is a block diagram of an active knowledge 65 
management system in accordance with a preferred embodi- 
ment; 
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FIG. 25 is a block diagram of a back end server in 
accordance with a preferred embodiment; 

FIG. 26 is a flow chart illustrating how the hardware and 
software of one embodiment of the present invention oper- 
ates; 

FIG. 27A illustrates a display of the browser mode in 
accordance with a preferred embodiment; and 

FIG. 27B is an illustration of a Mobile Portal platform in 
accordance with a preferred embodiment. 

DETAILED DESCRIPTION 

A preferred embodiment of a system in accordance with 
the present invention is preferably practiced in the context of 
a personal computer such as an IBM compatible personal 
computer, Apple Macintosh computer or UNIX based work- 
station. A representative hardware environment is depicted 
in FIG. 1, which illustrates a typical hardware configuration 
of a workstation in accordance with a preferred embodiment 
having a central processing unit 110, such as a 
microprocessor, and a number of other units interconnected 
via a system bus 112. The workstation shown in FIG. 1 
includes a Random Access Memory (RAM) 114, Read Only 
Memory (ROM) 116, an I/O adapter 118 for connecting 
peripheral devices such as disk storage units 120 to the bus 
112, a user interface adapter 122 for connecting a keyboard 
124, a mouse 126, a speaker 128, a microphone 132, and/or 
other user interface devices such as a touch screen (not 
shown) to the bus 112, communication adapter 134 for 
connecting the workstation to a communication network 
(e.g., a data processing network) and a display adapter 136 
for connecting the bus 112 to a display device 138. The 
workstation typically has resident thereon an operating 
system such as the Microsoft Windows NT or Windows/95 
Operating System (OS), the IBM OS/2 operating system, the 
MAC OS, or UNIX operating system. Those skilled in the 
art will appreciate that the present invention may also be 
implemented on platforms and operating systems other than 
those mentioned. 

Apreferred embodiment is written using JAVA, C, and the 
C++ language and utilizes object oriented programming 
methodology. Object oriented programming (OOP) has 
become increasingly used to develop complex applications. 
As OOP moves toward the mainstream of software design 
and development, various software solutions require adap- 
tation to make use of the benefits of OOP. A need exists for 
these principles of OOP to be applied to a messaging 
interface of an electronic messaging system such that a set 
of OOP classes and objects for the messaging interface can 
be provided. OOP is a process of developing computer 
software using objects, including the steps of analyzing the 
problem, designing the system, and constructing the pro- 
gram. An object is a software package that contains both 
data and a collection of related structures and procedures. 
Since it contains both data and a collection of structures and 
procedures, it can be visualized as a self-sufficient compo- 
nent that does not require other additional structures, pro- 
cedures or data to perform its specific task. OOP, therefore, 
views a computer program as a collection of largely autono- 
mous components, called objects, each of which is respon- 
sible for a specific task. This concept of packaging data, 
structures, and procedures together in one component or 
module is called encapsulation. 

In general, OOP components are reusable software mod- 
ules which present an interface that conforms to an object 
model and which are accessed at run-time through a com- 
ponent integration architecture. A component integration 
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architecture is a set of architecture mechanisms which allow 
software modules in different process spaces to utilize each 
others capabilities or functions. This is generally done by 
assuming a common component object model on which to 
build the architecture. 5 

It is worthwhile to differentiate between an object and a 
class of objects at this point. An object is a single instance 
of the class of objects, which is often just called a class. A 
class of objects can be viewed as a blueprint, from which 
many objects can be formed. 10 

OOP allows the programmer to create an object that is a 
part of another object. For example, the object representing 
a piston engine is said to have a composition-relationship 
with the object representing a piston. In reality, a piston 
engine comprises a piston, valves and many other compo- 15 
nents; the fact that a piston is an element of a piston engine 
can be logically and semantically represented in OOP by two 
objects. 

OOP also allows creation of an object that "depends 
from" another object. If there are two objects, one repre- 
senting a piston engine and the other representing a piston 
engine wherein the piston is made of ceramic, then the 
relationship between the two objects is not that of compo- 
sition. A ceramic piston engine does not make up a piston 
engine. Rather it is merely one kind of piston engine that has 25 
one more limitation than the piston engine; its piston is made 
of ceramic. In this case, the object representing the ceramic 
piston engine is called a derived object, and it inherits all of 
the aspects of the object representing the piston engine and 
adds farther limitation or detail to it. The object representing 
the ceramic piston engine "depends from" the object repre- 
senting the piston engine. The relationship between these 
objects is called inheritance. 

When the object or class representing the ceramic piston 35 
engine inherits all of the aspects of the objects representing 
the piston engine, it inherits the thermal characteristics of a 
standard piston defined in the piston engine class. However, 
the ceramic piston engine object overrides these ceramic 
specific thermal characteristics, which are typically different ^ 
from those associated with a metal piston. It skips over the 
original and uses new functions related to ceramic pistons. 
Different kinds of piston engines have different 
characteristics, but may have the same underlying functions 
associated with it (e.g., how many pistons in the engine, 4J 
ignition sequences, lubrication, etc.). To access each of these 
functions in any piston engine object, a programmer would 
call the same functions with the same names, but each type 
of piston engine may have different/overriding implemen- 
tations of functions behind the same name. This ability to 5Q 
hide different implementations of a function behind the same 
name is called polymorphism and it greatly simplifies com- 
munication among objects. 

With the concepts of composition-relationship, 
encapsulation, inheritance and polymorphism, an object can 55 
represent just about anything in the real world. In fact, our 
logical perception of the reality is the only Limit on deter- 
mining the kinds of things that can become objects in 
object-oriented software. Some typical categories are as 
follows: 60 
Objects can represent physical objects, such as automo- 
biles in a traffic-flow simulation, electrical components 
in a circuit-design program, countries in an economics 
model, or aircraft in an air-traffic-control system. 
Objects can represent elements of the computer-user 65 
environment such as windows, menus or graphics 
objects. 



An object can represent an inventory, such as a personnel 
rile or a table of the latitudes and longitudes of cities. 

An object can represent user-defined data types such as 
time, angles, and complex numbers, or points on the 
plane. 

With this enormous capability of an object to represent 
just about any logically separable matters, OOP allows the 
software developer to design and implement a computer 
program that is a model of some aspects of reality, whether 
that reality is a physical entity, a process, a system, or a 
composition of matter. Since the object can represent 
anything, the software developer can create an object which 
can be used as a component in a larger software project in 
the future. 

If 90% of a new OOP software program consists of 
proven, existing components made from preexisting reus- 
able objects, then only the remaining 10% of the new 
software project has to be written and tested from scratch. 
Since 90% already came from an inventory of extensively 
tested reusable objects, the potential domain from which an 
error could originate is 10% of the program. As a result, 
OOP enables software developers to build objects out of 
other, previously built, objects. 

This process closely resembles complex machinery being 
built out of assemblies and sub-assemblies. OOP 
technology, therefore, makes software engineering more like 
hardware engineering in that software is built from existing 
components, which are available to the developer as objects. 
All this adds up to an improved quality of the software as 
well as an increased speed of its development. 

Programming languages are beginning to fully support the 
OOP principles, such as encapsulation, inheritance, 
polymorphism, and composition-relationship. With the 
advent of the C++ language, many commercial software 
developers have embraced OOP. C++ is an OOP language 
that offers a fast, machine-executable code. Furthermore, 
C++ is suitable for both commercial-application and 
systems-programming projects. For now, C++ appears to be 
the most popular choice among many OOP programmers, 
but there is a host of other OOP languages, such as 
Smalltalk, common lisp object system (CLOS), and Eiffel. 
Additionally, OOP capabilities are being added to more 
traditional popular computer programming languages such 
as Pascal. 

The benefits of object classes can be summarized, as 
follows: 

Objects and their corresponding classes break down com- 
plex programming problems into many smaller, sim- 
pler problems. 

Encapsulation enforces data abstraction through the orga- 
nization of data into small, independent objects that can 
communicate with each other. Encapsulation protects 
the data in an object from accidental damage, but 
allows other objects to interact with that data by calling 
the object's member functions and structures. 

Subclassing and inheritance make it possible to extend 
and modify objects through deriving new kinds of 
objects from the standard classes available in the sys- 
tem. Thus, new capabilities are created without having 
to start from scratch. 

Polymorphism and multiple inheritance make it possible 
for different programmers to mix and match character- 
istics of many different classes and create specialized 
objects that can still work with related objects in 
predictable ways. 

Class hierarchies and containment hierarchies provide a 
flexible mechanism for modeling real- wo rid objects 
and the relationships among them. 
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Libraries of reusable classes are useful in many situations, 

but they also have some limitations. For example: 
Complexity. In a complex system, the class hierarchies for 
related classes can become extremely confusing, with 
many dozens or even hundreds of classes. 5 
Flow of control. A program written with the aid of class 
libraries is still responsible for the flow of control (i.e., 
it must control the interactions among all the objects 
created from a particular library). The programmer has 
to decide which functions to call at what times for 10 
which kinds of objects. 
Duplication of effort. Although class libraries allow pro- 
grammers to use and reuse many small pieces of code, 
each programmer puts those pieces together in a dif- 
ferent way. Two different programmers can use the 15 
same set of class libraries to write two programs that do 
exactly the same thing but whose internal structure 
(i.e., design) may be quite different, depending on 
hundreds of small decisions each programmer makes 
along the way. Inevitably, similar pieces of code end up 20 
doing similar things in slightly different ways and do 
not work as well together as they should. 
Q ass libraries are very flexible. As programs grow more 
complex, more programmers are forced to reinvent basic 
solutions to basic problems over and over again. A relatively 25 
new extension of the class library concept is to have a 
framework of class libraries. This framework is more com- 
plex and consists of significant collections of collaborating 
classes that capture both the small scale patterns and major 
mechanisms that implement the common requirements and 30 
design in a specific application domain. They were first 
developed to free application programmers from the chores 
involved in displaying menus, windows, dialog boxes, and 
other standard user interface elements for personal comput- 
ers. 35 

Frameworks also represent a change in the way program- 
mers think about the interaction between the code they write 
and code written by others. In the early days of procedural 
programming, the programmer called libraries provided by 
the operating system to perform certain tasks, but basically 40 
the program executed down the page from start to finish, and 
the programmer was solely responsible for the flow of 
control. This was appropriate for printing out paychecks, 
calculating a mathematical table, or solving other problems 
with a program that executed in just one way. 45 

The development of graphical user interfaces began to 
turn this procedural programming arrangement inside out. 
These interfaces allow the user, rather than program logic, to 
drive the program and decide when certain actions should be 
performed. Today, most personal computer software accom- 50 
plishes this by means of an event loop which monitors the 
mouse, keyboard, and other sources of external events and 
calls the appropriate parts of the programmer's code accord- 
ing to actions that the user performs. The programmer no 
longer determines the order in which events occur Instead, 55 
a program is divided into separate pieces that are called at 
unpredictable times and in an unpredictable order. By relin- 
quishing control in this way to users, the developer creates 
a program that is much easier to use. Nevertheless, indi- 
vidual pieces of the program written by the developer still 60 
call libraries provided by the operating system to accomplish 
certain tasks, and the programmer must still determine the 
flow of control within each piece after being called by the 
event loop. Application code still "sits on top of* the system. 

Even event loop programs require programmers to write 65 
a lot of code that should not need to be written separately for 
every application. The concept of an application framework 



carries the event loop concept further. Instead of dealing 
with all the nuts and bolts of constructing basic menus, 
windows, and dialog boxes and then making these things all 
work together, programmers using application frameworks 
start with working application code and basic user interface 
elements in place. Subsequently, they build from there by 
replacing some of the generic capabilities of the framework 
with the specific capabilities of the intended application. 

Application frameworks reduce the total amount of code 
that a programmer has to write from scratch. However, 
because the framework is really a generic application that 
displays windows, supports copy and paste, and so on, the 
programmer can also relinquish control to a greater degree 
than event loop programs permit. The framework code takes 
care of almost all event handling and flow of control, and the 
programmer's code is called only when the framework 
needs it (e.g., to create or manipulate a proprietary data 
structure). 

A programmer writing a framework program not only 
relinquishes control to the user (as is also true for event loop 
programs), but also relinquishes the detailed flow of control 
within the program to the framework. This approach allows 
the creation of more complex systems that work together in 
interesting ways, as opposed to isolated programs, having 
custom code, being created over and over again for similar 
problems. 

Thus, as is explained above, a framework basically is a 
collection of cooperating classes that make up a reusable 
design solution for a given problem domain. It typically 
includes objects that provide default behavior (e.g., for 
menus and windows), and programmers use it by inheriting 
some of that default behavior and overriding other behavior 
so that the framework calls application code at the appro- 
priate times. 

There are three main differences between frameworks and 
class libraries: 

Behavior versus protocol. Class libraries are essentially 
collections of behaviors that you can call when you 
want those individual behaviors in your program. A 
framework, on the other hand, provides not only behav- 
ior but also the protocol or set of rules that govern the 
ways in which behaviors can be combined, including 
rules for what a programmer is supposed to provide 
versus what the framework provides. 

Call versus override. With a class library, the code the 
programmer instantiates objects and calls their member 
functions. It's possible to instantiate and call objects in 
the same way with a framework (i.e., to treat the 
framework as a class library), but to take full advantage 
of a framework's reusable design, a programmer typi- 
cally writes code that overrides and is called by the 
framework. The framework manages the flow of con- 
trol among its objects. Writing a program involves 
dividing responsibilities among the various pieces of 
software that are called by the framework rather than 
specifying how the different pieces should work 
together. 

Implementation versus design. With class libraries, pro- 
grammers reuse only implementations, whereas with 
frameworks, they reuse design. A framework embodies 
the way a family of related programs or pieces of 
software work. It represents a generic design solution 
that can be adapted to a variety of specific problems in 
a given domain. For example, a single framework can 
embody the way a user interface works, even though 
two different user interfaces created with the same 
framework might solve quite different interface prob- 
lems. 
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Thus, through the development of frameworks for solu- 
tions to various problems and programming tasks, signifi- 
cant reductions in the design and development effort for 
software can be achieved. A preferred embodiment of the 
invention utilizes HyperText Markup Language (HTML) to 5 
implement documents on the Internet together with a 
general-purpose secure communication protocol for a trans- 
port medium between the client and the Newco. HTTP or 
other protocols could be readily substituted for HTML 
without undue experimentation. Information on these prod- 10 
ucts is available in T. Beraers-Lee, D. Connoly, "RFC 1866: 
Hypertext Markup Language — 2.0** (November 1995); and 
R. Fielding, H, Frystyk, T. Berners-Lee, J. Gettys and J. C. 
Mogul, "Hypertext Transfer Protocol— HTTP/1.1: HTTP 
Working Group Internet Draft" (May 2, 1996). HTML is a is 
simple data format used to create hypertext documents that 
are portable from one platform to another. HTML docu- 
ments are SGML documents with generic semantics that are 
appropriate for representing information from a wide range 
of domains. HTML has been in use by the World-Wide Web 20 
global information initiative since 1990. HTML is an appli- 
cation of ISO Standard 8879:1986 Information Processing 
Text and Office Systems; Standard Generalized Markup 
Language (SGML). 

To date, Web development tools have been limited in their 25 
ability to create dynamic Web applications which span from 
client to server and interoperate with existing computing 
resources. Until recently, HTML has been the dominant 
technology used in development of Web-based solutions. 
However, HTML has proven to be inadequate in the fol- 30 
lowing areas: 

Poor performance; 

Restricted user interface capabilities; 

Can only produce static Web pages; 

Lack of interoperability with existing applications and 
data; and 

Inability to scale. 

Sun Microsystem's Java language solves many of the 
client-side problems by: ^ 
Improving performance on the client side; 
Enabling the creation of dynamic, real-time Web appli- 
cations; and 

Providing the ability to create a wide variety of user 
interface components. 45 

With Java, developers can create robust User Interface 
(UI) components. Custom "widgets" (e.g., real-time stock 
tickers, animated icons, etc.) can be created, and client-side 
performance is improved. Unlike HTML, Java supports the 
notion of client-side validation, offloading appropriate pro- 50 
cessing onto the client for improved performance. Dynamic, 
real-time Web pages can be created. Using the above- 
mentioned custom UI components, dynamic Web pages can 
also be created. 

Sun's Java language has emerged as an industry- 55 
recognized language for "programming the Internet." Sun 
defines Java as: "a simple, object-oriented, distributed, 
interpreted, robust, secure, architecture-neutral, portable, 
high-performance, multithreaded, dynamic, buzzword- 
compliant, general-purpose programming language. Java 60 
supports programming for the Internet in the form of 
platform-independent Java applets." Java applets are small, 
specialized applications that comply with Sun's Java Appli- 
cation Programming Interface (API) allowing developers to 
add "interactive content" to Web documents (e.g., simple 65 
animations, page adornments, basic games, etc.). Applets 
execute within a Java-compatible browser (e.g., Netscape 
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Navigator) by copying code from the server to client. From 
a language standpoint, Java's core feature set is based on 
C++. Sun's Java literature states that Java is basically "C++, 
with extensions from Objective C for more dynamic method 
resolution". 

Another technology that provides similar function to 
JAVA is provided by Microsoft and ActiveX Technologies, 
to give developers and Web designers wherewithal to build 
dynamic content for the Internet and personal computers. 
ActiveX includes tools for developing animation, 3-D vir- 
tual reality, video and other multimedia content. The tools 
use Internet standards, work on multiple platforms, and are 
being supported by over 100 companies. The group's build- 
ing blocks are called ActiveX Controls, small, fast compo- 
nents that enable developers to embed parts of software in 
hypertext markup language (HTML) pages. ActiveX Con- 
trols work with a variety of programming languages includ- 
ing Microsoft Visual C++, Borland Delphi, Microsoft Visual 
Basic programming system and, in the future, Microsoft's 
development tool for Java, code named "Jakarta." ActiveX 
Technologies also includes ActiveX Server Framework, 
allowing developers to create server applications. One of 
ordinary skill in the art readily recognizes that ActiveX 
could be substituted for JAVA without undue experimenta- 
tion to practice the invention. 

In accordance with a preferred embodiment, Background- 
Finder (BF) is implemented as an agent responsible for 
preparing an individual for an upcoming meeting by helping 
him/her retrieve relevant information about the meeting 
from various sources. BF receives input text in character 
form indicative of the target meeting. The input text is 
generated in accordance with a preferred embodiment by a 
calendar program that includes the time of the meeting. As 
the time of the meeting approaches, the calendar program is 
queried to obtain the text of the target event and that 
information is utilized as input to the agent. Then, the agent 
parses the input meeting text to extract its various compo- 
nents such as title, body, participants, location, time etc. The 
system also performs pattern matching to identify particular 
meeting fields in a meeting text. This information is utilized 
to query various sources of information on the web and 
obtain relevant stories about the current meeting to send 
back to the calendaring system. For example, if an indi- 
vidual has a meeting with Netscape and Microsoft to talk 
about their disputes, and would obtain this initial informa- 
tion from the calendaring system. It will then parse out the 
text to realize that the companies in the meeting are 
"Netscape" and "Microsoft" and the topic is "disputes." 
Then, the system queries the web for relevant information 
concerning the topic. Thus, in accordance with an objective 
of the invention, the system updates the calendaring system 
and eventually the user with the best information it can 
gather to prepare the user for the target meeting. In accor- 
dance with a preferred embodiment, the information is 
stored in a file that is obtained via selection from a link 
imbedded in the calendar system. 

Program Organization 

A computer program in accordance with a preferred 
embodiment is organized in five distinct modules: BE Main, 
BF.Parse, Background Finder.Error, BF.PatternMatching 
and BF.Search. There is also a frmMain which provides a 
user interface used only for debugging purposes. The 
executable programs in accordance with a preferred embodi- 
ment never execute with the user interface and should only 
return to the calendaring system through Microsoft's Win- 
sock control. A preferred embodiment of the system 
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executes in two different modes which can be specified 
under the command line sent to it by the calendaring system. 
When the system runs in simple mode, it executes a keyword 

query to submit to external search engines. When executed 

in complex mode, the system performs pattern matching 5 s where() As String 
before it forms a query to be sent to a search engine. 
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-continued 



A.l.U.1.1 



Public Type tMeetingRecord 



Public Type tMeetingRecord 



sUserlD As String 
sTiileOrig As String 

sTitleKW As String 
sBodyKW As String 
sCompanyO As String 

sTopicO As String 

sPeopleQ As String 

sWhenQ As String 



(user id given by Munin) 
(original non stop Listed title we need to keep 
around to send back to Munin) 
(stoplisted title with only keywords) 
(stoplisted body with only keywords) 
(companies identified in title or body through 
pattern matching) 

(topics identified in title or body through 
pattern matching) 

(people identified in title or body through 
pattern matching) 

(time identified in title or body through pattern 
matching) 



10 



Data Structures 

The system in accordance with a preferred embodiment 
utilizes three user defined structures: 

1. TMeetingRecord; 

2. TPatteraElement; and 

3. TPattemRecord. 
The user-defined structure, tMeetingRecord, is used to 

store all the pertinent information concerning a single meet- 
ing. This info includes userlD, an original description of the 
meeting, the extracted list of keywords from the title and 
body of meeting etc. It is important to note that only one 
meeting record is created per instance of the system in 
accordance with a preferred embodiment. This is because 
each time the system is spawned to service an upcoming 
meeting, it is assigned a task to retrieve information for only 
one meeting. Therefore, the meeting record created corre- 
sponds to the current meeting examined. ParseMeetingText ^ 
populates this meeting record and it is then passed around to 
provide information about the meeting to other functions. If 
GoPattemMatch can bind any values to a particular meeting 
field, the corresponding entries in the meeting record is also 
updated. The structure of tMeetingRecord with each field 30 
described in parentheses is provided below in accordance 
with a preferred embodiment. 



20 



(location identified in title or body through 
pattern matching) 
sLocation As String (location as passed in by Munin) 
sTimc As String (time as passed in by Munin) 

sParticipantsO As String (all participants engaged as passed in by 
Munin) 

sMeetingText As String (the original meeting text w/o userid) 
EndType 



There are two other structures which are created to hold 
each individual pattern utilized in pattern matching. The 
record tAPatternRecord is an array containing all the 
components/elements of a pattern. The type tAPatternEle- 
ment is an array of strings which represent an element in a 
pattern. Because there may be many "substitutes'' for each 
element, we need an array of strings to keep track of what 
all the substitutes are. The structures of tAPatterriElement 
and tAPatternRecord are presented below in accordance 
with a preferred embodiment. 



Public Type tAPatterriElement 

clcmcntArrayO As String 
End Type 

Public Type tAPatternRecord 

patternArrayO As tAPatternElemcnt 
End Type 



Common User Defined Constants 



35 Many constants are defined in each declaration section of 
the program which may need to be updated periodically as 
part of the process of maintaining the system in accordance 
with a preferred embodiment. The constants are accessible 
to allow dynamic configuration of the system to occur as 

40 updates for maintaining the code. 

Included in the following tables axe lists of constants from 
each module which I thought are most likely to be modified 
from time to time. However, there are also other constants 
45 used in the code not included in the following list. It does not 
mean that these non-included constants will never be 
changed. It means that they will change much less fre- 
quently. 



CONSTANT 



For the Main Module (BF.Main): 
PRESET VALUE USE 



MSGTOMUNIN_TYPE 6 

EP_ADDRESS_MUNIN w 10.2.: 

PORT_MUNIN 7777 

TTMKOUT_AV 60 

TIMEOUT_NP 60 



Define the message number used 
to identify messages between BF 
and Munin 

Define the IP address of the 

machine in which Munin and BF 

are running on so they can transfer 

data through UDP. 

Define the remote port in which 

we are operating on. 

Define constants for setting time 

out in inet controls 

Define constants for setting time 

out in inet controls 
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For the Main Module (BF.Main): 
CONSTANT PRESET VALUE USE 

CMD_SEPARATOR "\" Define delimiter to tell which part 

of Munin's command represents 
the beginning of our input meeting 
text 

O UTPAR AM_S E PARATO R Define delimiter for separating out 

different portions of the output. 
The separator is for delimiting the 
msg type, the user id, the meeting 
title and the beginning of the 
actual stories retrieved. 



For the Seaich Module fBF.Seardti: 



CONSTANT 



CURRENT VALUE 



USE 



PAST__NDAYS 5 

CONNECTOR_AV_URL "+AND+" 

CONNECTOR_NP_URL "+AND+" 

NUM_NP_5TORIES 3 

NUM_AV_STORIES 3 



Define number of days you want to 
look back for AltaVista articles. 
Doesn't really matter now because 
we aren't really doing a news 
search in alta vista. We want all 
info. 

Define how to connect keywords. 
We want all our keywords in the 
string so for now use AND. If you 
want to do an OR or something, 
just change connector. 
Define how to connect keywords. 
We want all our keywords in the 
string so for now use AND. If you 
want to do an OR or something, 
just change connector. 
Define the number of stories to 
return back to Munin from 
NewsPagc. 

Define the number of stories to 
return back to Munin from 
AltaVista. 



For the Parse Module fBF. Parse) : 



CONSTANT 



CURRENT VALUE USE 



PORTION_S EPARATOR 



PARTICIPANT_SEPARATOR "|" 



Define the separator between 
different portions of the meeting 
text sent in by Munin. For example 
in "09::Meet with Chad::about 
life: :Chad|Demse: is the 
separator between different parts 
of the meeting text 
Define the separator between each 
participant in the participant list 
portion of the original meeting 
text. Refer to example above. 



For Pattern Matching Module (BFPatternMatch): There 
are no constants in this module which require frequent 
updates. 

General Process Flow 

The best way to depict the process flow and the coordi- 
nation of functions between each other is with the five 
flowcharts illustrated in FIGS. 2 to 6. FIG. 2 depicts the 
overall process flow in accordance with a preferred embodi- 



ment. Processing commences at the top of the chart at 
function block 200 which launches when the program starts. 

60 Once the application is started, the command line is parsed 
to remove the appropriate meeting text to initiate the target 
of the background find operation in accordance with a 
preferred embodiment as shown in function block 210. A 
global stop list is generated after the target is determined as 

65 shown in function block 220. Then, all the patterns that are 
utilized for matching operations are generated as illustrated 
in function block 230. Then, by tracing through the chart, 
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function block 200 invokes GoBF 240 which is responsible 
for logical processing associated with wrapping the correct 
search query information for the particular target search 
engine. For example, function block 240 flows to function 
block 250 and it then calls GoPatternMatch as shown in 5 
function block 260. To see the process flow of 
GoPatternMatch, we swap to the diagram titled "Process 
Flow for BFs Pattern Matching Unit." 

One key thing to notice is that functions depicted at the 
same level of the chart are called by in sequential order from 10 
left to right (or top to bottom) by their common parent 
function. For example, Main 200 calls ProcessCom- 
mandLinc 210, then CreateStopListist 220, then CreatePat- 
terns 230, then GoBackgroundFinder 240. FIGS. 3 to 6 
detail the logic for the entire program, the parsing unit, the 15 
pattern matching unit and the search unit respectively. FIG. 
6 details the logic determinative of data flow of key infor- 
mation through BackgroundFinder, and shows the functions 
that are responsible for creating or processing such infor- 
mation. 20 



Detailed Search Architecture Under the Simple 
Query Mode 
Search Alta Vista 



25 



(Function Block 270 of FIG. 2) 

The Alta Vista search engine utilizes the identifies and 
returns general information about topics related to the cur- 30 
rent meeting as shown in function block 270 of FIG. 2. The 
system in accordance with a preferred embodiment takes all 
the keywords from the title portion of the original meeting 
text and constructs an advanced query to send to Alta Vista. 
The keywords are logically combined together in the query. 35 
The results are also ranked based on the same set of 
keywords. One of ordinary skill in the art will readily 
comprehend that a date restriction or publisher criteria could 
be facilitated on the articles we want to retrieve. A set of top 
ranking stories are returned to the calendaring system in 40 
accordance with a preferred embodiment. 



News Page 
(Function Block 275 of FIG. 2) 



45 



The NewsPage search system is responsible for giving us 
the latest news topics related to a target meeting. The system 
takes all of the keywords from the title portion of the original 
meeting text and constructs a query to send to the NewsPage 50 
search engine. The keywords are logically combined 
together in the query. Only articles published recently are 
retrieved. The Newspage search system provides a date 
restriction criteria that is settable by a user according to the 
user's preference. The top ranking stories are returned to the 55 
calendaring system. 

FIG. 3 is a user profile data model in accordance with a 
preferred embodiment. Processing commences at function 
block 300 which is responsible for invoking the program 
from the main module. Then, at function block 310, a 60 
wrapper function is invoked to prepare for the keyword 
extraction processing in function block 320. After the key- 
words are extracted, then processing flows to function block 
330 to determine if the delimiters are properly positioned. 
Then, at function block 340, the number of words in a 65 
particular string is calculated and the delimiters for the 
particular field are and a particular field from the meeting 



text is retrieved at function block 350. Then, at function 
block 380, the delimiters of the string are again checked to 
assure they are placed appropriately. Finally, at function 
block 360, the extraction of each word from the title and 
body of the message is performed a word at a time utilizing 
the logic in function block 362 which finds the next closest 
word delimiter in the input phrase, function block 364 which 
strips unnecessary materials from a word and function block 
366 which determines if a word is on the stop list and returns 
an error if the word is on the stop list. 

Pattern Matching in Accordance with a Preferred 
Embodiment 

The limitations associated with a simple searching 
method include the following: 

1. Because it relies on a stoplist of unwanted words in 
order to extract from the meeting text a set of 
keywords, it is limited by how comprehensive the 
stoplist is. Instead of trying to figure out what parts of 
the meeting text we should throw away, we should 
focus on what parts of the meeting text we want. 

2. A simple search method in accordance with a preferred 
embodiment only uses the keywords from a meeting 
title to form queries to send to Alta Vista and News- 
Page. This ignores an alternative source of information 
for the query, the body of the meeting notice. We cannot 
include the keywords from the meeting body to form 
our queries because this often results in queries which 
are too long and so complex that we often obtain no 
meaningful results. 

3. There is no way for us to tell what each keyword 
represents. For example, we may extract "Andy" and 
"Grove" as two keywords. However, a simplistic 
search has no way knowing that "Andy Grove" is in 
fact a person's name. Imagine the possibilities if we 
could somehow intelligently guess that "Andy Grove" 
is a person's name. Information such as where he is 
employed and currently resides. 

4. In summary, by relying solely on a stoplist to parse out 
unnecessary words, we suffer from "information over- 
load". 

Pattern Matching Overcomes these Limitations in 
Accordance with a Preferred Embodiment 

Here is how the pattern matching system can address each 
of the corresponding issues above in accordance with a 
preferred embodiment. 

1. By doing pattern matching, we match up only parts of the 
meeting text that we want and extract those parts. 

2. By performing pattern matching on the meeting body and 
extracting only the parts from the meeting body that we 
want. Our meeting body will not go to complete waste 
then. 

3. Pattern matching is based on a set of templates that we 
specify, allowing us to identify people names, company 
names and other items from a meeting text. 

4. In summary, with pattern matching, we no longer suffer 
from information overload. Of course, the big problem is 
how well our pattern matching works. If we rely exclu- 
sively on artificial intelligence processing, we do not have 
a 100% hit rate. We are able to identify about 20% of all 
company names presented to us. 

Patterns 

A pattern in the context of a preferred embodiment is a 
template specifying the structure of a phrase we are looking 
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for in a meeting text. The patterns supported by a preferred 
embodiment are selected because they are templates of 
phrases which have a high probability of appearing in 
someone's meeting text. For example, when entering a 
meeting in a calendar, many would write something such as 
"Meet with Bob Dutton from Stanford University next 
Tuesday." A common pattern would then be something like 
the word "with" followed by a person's name (in this 
example it is Bob Dutton) followed by the word "from" and 
ending with an organization's name (in this case, it is 
Stanford University). 



within a pattern group share a similar format and they only 
differ from each other in terms of what indicators are used 
as substitutes. Note that the patterns which are grayed out 
are also commented in the code. BF has the capability to 
support these patterns but we decided that matching these 
patterns is not essential at this point. 



10 



Pattern Matching Terminology 

The common terminology associated with pattern match- 
ing is provided below. 

Pattern: a pattern is a template specifying the structure of 

a phrase we want to bind the meeting text to. It contains 

sub units. 

Element: a pattern can contain many sub-units. These 
subunits are called elements. For example, in the pat- 
tern "with SPEOPLES from $COMPANY$", "with" 
"SPEOPLES" "from" "$COMPANY$" are all ele- 
ments. 

Placeholder: a placeholder is a special kind of element in 
which we want to bind a value to .Using the above 
example, "SPEOPLES" is a placeholder. 

Indicator: an indicator is another kind of element which 
we want to find in a meeting text but no value needs to 
bind to it. There may be often more than one indicator 30 
we are looking for in a certain pattern. That is why an 
indicator is not an "atomic" type. 

Substitute: substitutes are a set of indicators which are all 
synonyms of each other. Finding any one of them in the 
input is good. 

There are five fields which are identified for each meeting: 
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♦ 


Company 


($COMPANY$) 


♦ 


People 


($PEOPLE$) 


♦ 


Location 


($LOCAnON$) 


♦ 


Time 


($TIME$) 


♦ 


Tbpic 


($TOPIC_UPPER$) or ($TOPIC_ALL$) 



40 



50 



In parentheses are the placeholders I used in my code as 45 
representation of the corresponding meeting fields. 
Each placeholder has the following meaning: 
$COMPANY$: binds a string of capitalized words (e.g., 

Meet with Joe Carter of < Andersen Consulting >) 
SPEOPLES: binds series of string of two capitalized 
words potentially connected by "and" or "&" (e.g., 
Meet with <Joe Carte r> of Andersen Consulting, Meet 
with <Joe Carter and Luke Hughes> of Andersen 
Consulting) 

SLOCATIONS: binds a string of capitalized words (e.g., 

Meet Susan at <Palo Alto Square>) 
STIMES: binds a string containing the format #:## (e.g., 

Dinner at <6:30 pm>) 
$TOPIC_UPPER$: binds a string of capitalized words 60 

for our topic (e.g., <Stanford Engineering Recruiting> 

Meeting to talk about new hires). 
$TOPIC_ALL$: binds a string of words without really 

caring if it's capitalized or not. (e.g., Meet to talk about 

<ubiquitous computing*) 
Here is a table representing all the patterns supported by 
BF. Each pattern belongs to a pattern group. All patterns 
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PAT 


PAT 






GRP 


# 


PATTERN 


EXAMPLE 


1 


a 


$PEOPLE$ of 


Paul Maritz of Microsoft 






$COMPANY$ 






b 


$PEOPLE$ from 


Bill Gates, Paul Allen and 






$COMPANY$ 


Paul Maritz from Microsoft 


2 


a 


$TOPIC_UPPER$ meeting Push Technology Meeting 




b 


$TOPIC_UPPER$ mtg 


Push Technology Mtg 




c 


$TOPIC_UPPER$ demo 


Push Technology demo 




d 


$TOPIC_UPPER$ 


Push Technology interview 






interview 






e 


$TOPIC_UPPER$ 


Push Technology 






presentation 


presentation 




f 


$TOPIC_UPPER$ visit 


Push Technology visit 




g 


$TOPIC_UPPER$ briefing 


Push Technology briefing 




h 


$TOPIC_UPPER$ 


Push Technology 






discussion 


discussion 




i 


$TOPIC_UPPER$ 


Push Technology 






workshop 


workshop 




J 


$TOPIC„UPPER$ prep 


Push Technology prep 




k 


$TOPIC_UPPER$ review 


Push Technology review 




1 


$TOPIC_UPPER$ lunch 


Push Technology lunch 




m 


$TOPIC_UPPER$ project 


Push Technology project 




n 


$TOPIC__UPPER$ projects Push Technology projects 


3 


a 


$COMPANY$ corporation 


Intel Corporation 




b 


$COMPANY$ corp. 


IBM Corp. 




c 


$COMPANY$ systems 


Cisco Systems 




d 


$COMPANY$ limited 


IBM limited 




e 


$COMPANY$ ltd 


IBM ltd 


4 


a 


about $TOPIC__ALL$ 


About intelligent agents 








technology 




b 


discuss STOPIC _ALL$ 


Discuss intelligent agents 








technology 




c 


show $TOPIC_AlX$ 


Show the client our 








intelligent agents 








technology 




d 


re: $TOPIC_ALL$ 


re: intelligent agents 








technology 




e 


review $TOPIC_ALL$ 


Review intelligent agents 








technology 




f 


agenda 


The agenda is as follows: 








—clean up 








— clean up 








— clean up 




g 


agenda: $TOPIC_ALL$ 


Agenda: 








—demo client intelligent 








agents technology. 








— demo ecommerce. 


5 


a 


w/$PEOPLE$ of 


Meet w/Joe Carter of 






$COMPANY$ 


Andersen Consulting 




b 


w/$PEOPLE$ from 


Meet w/Joe Carter from 






$COMPANY$ 


Andersen Consulting 


6 


a 


w/$COMPANY$ per 


Talk w/Intel pci Jason 






$PEOPLE$ 


Foster 


7 


a 


At $TIME$ 


at 3:00 pm 




b 


Around $TtME$ 


Around 3:00 pm 


8 


a 


At $lX>CATTON$ 


At LuLu's restaurant 




b 


In $LOCATION$ 


in Santa Clara 


9 


a 


Per$PEOPLE$ 


per Susan Butler 


10 


a 


call wy$PEOPLE$ 


Conf call w/John Smith 




B 


call with SPEOPLES 


Conf call with John Smith 


11 


A 


prep for $TOPIC_^ALL$ 


Prep for London meeting 




B 


preparation for 


Preparation for London 






$TOPIC_ALL$ 


meeting 



FIG. 4 is a detailed flowchart of pattern matching in 
accordance with a preferred embodiment. Processing com- 
mences at function block 400 where the main program 
invokes the pattern matching application and passes control 
to function block 410 to commence the pattern match 
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processing. Then, at function block 420, the wrapper fane- net in accordance with a preferred embodiment. Processing 

tion loops through to process each pattern which includes commences at function block 500 and immediately flows to 

determining if a part of the text string can be bound to a function block 510 to process the wrapper functionality to 

pattern as shown in function block 430. Then, at function prepare for an Internet search utilizing a web search engine, 

block 440, various placeholders are bound to values if they 5 If the search is to utilize the Alta Vista search engine, then 

exist, and in function block 441, a list of names separated by at function block 530, the system takes information from the 

punctuation are bound, and at function block 442 a full name meeting record and forms a query in function blocks 540 to 

is processed by finding two capitalized words as a full name 560 for submittal to the search engine. If the search is to 

and grabbing the next letter after a space after a word to utilize the NewsPage search engine, then at function block 

determine if it is capitalized. Then, at function block 443, 10 520, the system takes information from the meeting record 

time is parsed out of the string in an appropriate manner and and forms a query in function blocks 521 to 528. 
the next word after a blank space in function block 444. 

Then, at function block 445, the continuous phrases of vista Search Engine 
capitalized words such as company, topic or location are 

bound and in function block 446, the next word after the is The strength of the Alta Vista search engine is that it 

blank is obtained for further processing in accordance with provides enhanced flexibility. Using its advance query 

a preferred embodiment. Following the match meeting field method, one can construct all sorts of Boolean queries and 

processing, function block 450 is utilized to locate an rank the search however you want. However, one of the 

indicator which is the head of a pattern, the next word after biggest drawbacks with Alta Vista is that it is not very good 

the blank is obtained as shown in function block 452 and the 20 a t handling a large query and is likely to give back irrelevant 

word is checked to determine if the word is an indicator as results. If we can identify the topic and the company within 

shown in function block 454. Then, at function block 460, a meeting text, we can form a pretty short but comprehen- 

the string is parsed to locate an indicator which is not at the s ive query which will hopefully yield better results. We also 

end of the pattern and the next word after unnecessary white want to focus on the topics found It may not be of much 

space such as that following a line feed or a carriage return 25 merit to the user to find out info about a company especially 

is processed as shown in function block 462 and the word is if the user already knows the company well and has had 

analyzed to determine if it is an indicator as shown in numerous meetings with them. It's the topics they want to 

function block 464. Then, in function block 470, the tern- research on. 
porary record is reset to the null set to prepare it for 

processing the next string and at function block 480, the 30 News p Search ^ me 
meeting record is updated and at function block 482 a check 

is performed to determine if an entry is already made to the Th e stre ngth of the News Page search engine is that it 

meeting record before parsing the meeting record again. does a great job searching for the most recent news if you are 

_ T . TJ . able to give it a valid company name. Therefore when we 

Using the Identified Meeting Fields 35 submit a qiiery to ^ news page web site> we ^ whatever 

Now that we have identified fields within the meeting text company name we can identify and only if we cannot find 

which we consider important, there are quite a few things we oae do we use me t0 P ics found to form a query. If neither one 

can do with it. One of the most important applications of is found, then no search is performed. The algorithmn 

pattern matching is of course to improve the query we utilized to form the query to submit to Alta Vista is illus- 

construct which eventually gets submitted to Alta Vista and 40 u * tGd in mG - 7 * ^ algorithmn that we will use to form the 

News Page. There are also a lot of other options and to submit t0 News Pa S e k illustrated in FIG. 8. 

enhancements which exploit the results of pattern matching The following table describes in detail each function in 

that we can add to BF. These other options will be described accordance with a preferred embodiment. The order in 

in the next section. The goal of this section is to give the 45 which functions appear mimics the process flow as closely 

reader a good sense of how the results obtained from pattern as possible. When there are situations in which a function is 

matching can be used to help us obtain better search results. ca u ed several times, this function will be listed after the first 

FIG. 5 is a flowchart of the detailed processing for function which calls it and its description is not duplicated 

preparing a query and obtaining information from the Inter- after every subsequent function which calls it 



Procedure 
Name 

Main 
(BF.Main) 



Type 
Public 

Sub 



ProcessCommandline Private 



(BF.Main) 

Creates topList 

(BF.Main) 

CreatePatterns 

(BF.Pattern 

Match) 



Sub 

Private 

Function 

Public 

Sub 



Called By 
None 



Main 



Description 

This is the main function where the program first launches. It initializes BF with the 
appropriate parameters (e.g., Internet time-out, stop list. . .) and calls GoBF to launch the 
main part of the program. 

This function parses the command line. It assumes that the delimiter indicating 
the beginning of imput from Munin is stored in the constant CMD__SEPARATOR. 
This function sets up a stop list for future use to parse out unwanted words from the 
meeting text. There are commas on each side of each word to enable straight checking. 
This procedure is called once when BF is first initialized to create all the potential 
patterns that portions of the meeting text can bind to. A pattern can contain however 
many elements as needed. There are two types of elements. The first type of elements are 
indicators. These arc real words which delimit the potential of a meeting field (eg 
company) to follow. Most of these indicators are stop words as expected because stop 
words are words usually common to all meeting text so it makes sense they form 
patterns. The second type of elements are special strings which represent 
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Procedure 
Name 



TVpe 



Called By 



Description 



GoBF 
(BF.Main) 
ParseMeeting 
Text 

(BF.Parse) 



FormatDelimitation 
(BF.Parse) 



Public 
Sub 
Public 
Function 



Private 



DetermineNumWords Public 
(BF.Parse) Function 



GetAWord From String Public 
(BF.Parse) Function 



Parse AndCleanPhrase Private 
(BF.Parse) Function 



FindMin 
(BF.Parse) 



CleanWord 
(BF.Parse) 



EvahiateWord 
(BF.Parse) 



GoPattcrnMatch 
(BF. Pattern 
Match) 

MatchPattcms 

(BF.Pattern 

Match) 



MatchAPattern 
(BF.Pattern 



Main 



GoBackGroundFinder 



ParseMeetingText, 

DetermineNum 

Words, 

GctAWordFrom 
String 

ParseMeeting 
Text, 

ProccssStop 
List 



ParseMeeting 
Text, 

ProccssStop 
Ust 



Pars e M ee tingTex t 



Private ParseAndCleanPhrase 
Function 



Private ParseAndCleanPhrase 
Function 



Private ParseAndCleanPhrase 
Function 



Public GoBF 
Sub 



Public Go Pattern Match 
Sub 



Private MatchPattcrns 
Function 



placeholders. A placeholder is always in the form of $*$ where * can be either 
PEOPLE, COMPANY, TOPIC_UPPER, TIME, LOCATION or TOPIC_^ALL. A 
pattern can begin with either one of the two types of elements and can be however long, 
involving however any number/type of elements. This procedure dynamically creates a 
new pattern record for each pattern in the table and it also dynamically creates new 
tAPatternElements for each element within a pattern. In addition, there is the 
concept of being able to substitute indicators within a pattern. For example, the pattern 
$PEOPLE$ of $COMPANY$ is similar to the pattern $PEOPLE$ from $COMPANY$. 
"from" is a substitute for "of. 

Our structure should be able to express such a need for substitution. 

This is a wrapper procedure that calls both the parsing and the searching 

subroutines of the BF. It is also responsible for sending data back to Munin. 

This function takes the initial meeting text and identifies the userlD 

of the record as well as other parts of the meeting text including the title, body, 

participant list, location and time. In addition, we call a helper function 

ProccssStop List to eliminate all the unwanted words from the originalmeeting title and 

meeting body so that only keywords are left. The information parsed out is stored in the 

MeetingRecord structure. Note that this function does no error checking and for the most 

time assumes that the meeting text string is correctly formatted by Munin. 

The important variable is thisMeeting Record is the temp holder for all info regarding 

current meeting. It's eventually returned to caller. 

There are 4 ways in which the delimiters can be placed. We take care of all these 
cases by reducing them down to Case 4 in which there are no delimiters around but 
only between fields in a string(e.g., A::B::C) 



This functions determines how many words there are in a string (stlnEvalString) The 
function assumes that each word is separated by a designated separator as specified 
in stSeparator. The return type is an integer that indicates how many woids have been 
found assuming each word in the string is separated by stSeparator. This function is 
used along with GetAWordFromString and should be called before calling 
GctAWordFrom String. 

This function extracts the ith word of the string(stInEvalStruig)assuming that each 
word in the string is separated by a designated separator contained in the variable 
stSeparator. In most cases, use this function with DetermineNumWords. The function 
returns the wanted word. This function checks to make sure that ilnWordNum is within 
bounds so that i is not greater than the total number of words in string or less than/equal 
to zero. If it is out of bounds, we return empty string to indicate we can't get anything. 
We try to make sure this doesn't happen by calling DetermineNumWords first. 
This function first grabs the word and send it to CleanWord in order strip the stuff that 
nobody wants. There are things in parseWord that will kill the word, so we will need a 
method of looping through the body and rejecting words without killing the whole 
function i guess keep CleanWord and check a return value ok, now I have a word so 
I need to send it down the parse chain. This chain goes ParscCleanPhrase -> 
CleanWord -> EvaluateWord. If the word gets through the entire chain without 
being killed, it will be added at the end to our keyword string, first would be the function 
that checks for "/" as a delimiter and extracts the parts of that This I will call 
"StitchFace" (Denise is more normal and calls it GetAWordFromString) if this finds 
words, then each of these will be sent, in turn, down the chain. 
If these get through the entire chain without being added or killed then they will be 
added rather than tossed. 

This function takes in 6 input values and evaluates to see what the minimum non zero 
value is. It first creates an array as a holder so that can sort the five input values 
in ascending order. Thus the minimum value will be the first non zero value element 
of the array. If we go through entire array without finding a non zero value, we know 
that there is an error and we exit the function. 

This function tries to clean up a word in a meeting text. It first of all determines if 
the string is of a valid length. It then passes it through a series of tests to see it is and 
when needed, it will edit the word and strip unnecessary characters off of it. Such tests 
includes getting rid of file extensions, non chars, numbers etc. 
This function tests to see if this word is in the stop list so it can determine 
whether to eliminate the word from the original meeting text If a word is not in the 
stoplist, it should stay around as a keyword and this function exits beautifully with 
no errors. However, if the words is a stopword, an error must be returned. We must 
properly delimit the input test string so we don't accidentally retrieve substrings. 
This procedure is called when our QueryMethod is set to complex query meaning 
we do want to do all the pattern matching stuff- It's a simple wrapper function which 
initializes some arrays and then invokes pattern matching on the title and the body. 
This procedure loops through every pattern in the pattern table and 
tries to identify different fields within a meeting text specified by sin Eva 1 String. For 
debugging purposes it also tries to tabulate how many times a certain pattern was 
triggered and stores it in glabulatcMatcbcs to see which pattern fired the most 
gTabulateMatches is stored as a global because we want to be able to run a batch file 
of 40 or 50 test strings and still be able to know how often a pattern was triggered. 
This function goes through each element in the current pattern. It first evaluates to 
determine whether element is a placeholder or an indicator. If it is a placeholder, then it 
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Procedure 
Name 



TVpe 



Called By 



Description 



Match) 



etingField 

(BF.Pattern 

Match) 



BindNames 
(BF.Pattern 
Match) 



BindAFullName 

(BF.Pattern 

Match) 



GetNextWordAfier- 
WhiteSpace 
(BF.Pattern 
Match) 

BindTime 

(BF.Pattern 

Match) 

Bind Company- 
Top icLoc 
(BF.Pattern 
Match) 

Locate PatternHead 

(BF.Pattern 

Match) 



Contain! nArray 

(BF.Pattern 

Match) 

Locatelndicator 

(BF.Pattern 

Match) 



InitializeGuess- 
Record 
(BF.Pattern 
Match) 

AddToMeeting- 

Record 

(BF.Pattern 



No Duplicate 
Entry 

(BF.Pattern 
Match) 

SearchAlta Vtsta 
(BF.Search) 



Construe tAIta- 

VistaURL 

(BF.Search) 



Private MatchAPattern 
Function 



Private MatchMectingFicld 
Function 



Private BindNames 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Private 
Sub 



Private 
Sub 



BindAFuIl 

Name, 

BindTime, 

BindCompanyTo 

picLoc 

MatchMcetingField 



MatchMeetingField 



MatchAPattern 



LocatePattern 
Head. 

Locatelndicator 
MatchAPattern 



MatchAPattern 



MatchAPattern 



Private AddToMeetingRecord 
Function 



Public GoBackGroundFinder 
Function 



Private SearchAlta Vista 
Function 



will try to bind the placeholder with some value. If it is an indicator, then we try to 
locate it. There is a trick however. Depending on whether we are at current element is 
the head of the pattern or not we want to take different actions, tf we are at the head, 
we want to look for the indicator or the placeholder. If we can't End it, then wc 
know that the current pattern doesn't exist and we quit However, if it is not 
the head, then we continue looking, because there may still be a head somewhere. 
We retry in this case. 

This function uses a big switch statement to first determine what kind of placeholder we 
are talking about and depending on what type of placeholder, we have specific 
requirements and different binding criteria as specified in the subsequent functions 
called such as BindNames, BindTime etc. If binding is successful we add it to our 
guessing record. 

In this function, we try to match names to the corresponding placeholder $PEOPLE$. 
Names are denned as any consecutive two words which are capitalized. We also what 
to retrieve a series of names which are connected by and, or & so we look until we 
don't see any of these 3 separators anymore. Note that we don't want to bind 
single word names because it is probably too general anyway so we don't want to 
produce broad but irrelevant results. This function calls BindAFullName which binds 
one name so in a since BindNames collects all the results from BindAFullName 
This function tries to bind a full name. If the $ PEOPLES placeholder is not the head 
of the pattern, we know that it has to come right at the beginning of the test string 
because we've been deleting stuff off the head of the string all along. If it is the 
head, we search until we find something that looks like a full name. If we 
can't find it, then there's no such pattern in the text entirely and we 
quit entirely from this pattern. This should eventually return us to the next 
pattern in MatchPatterns. 

This function grabs the next word in a test string. It looks for the next word after white 
spaces, @ or/. The word is defined to end when we encounter another one of these 
white spaces or separators. 



Get the immediate next word and see if it looks like a time pattern. If so we've found a 
time and so we want to add it to the record We probably should add more time patterns 
But people don't seem to like to enter the time in their titles these days especially since 
we now have tools like OutLook 

This function finds a continuous capitalized string and binds it to stMatch which is 
passed by reference from MatchMeetingField A continuous capitalized string is a 
sequence of capitalized words which arc not interrupted by things like, etc. 
There's probably more stuff we can add to the list of interruptions. 
This function tries to locate an element which is an indicator. Note that this indicator 
SHOULD BE AT THE HEAD of the pattern otherwise it would have gone to the 
function Locatelndicator instead. Therefore, we keep on grabbing the next word until 
either there's no word for us to grab (quit) or if we find one of the indicators we 
are looking for. 

'This function is really simple. It loops through all the elements in the array 'to find 
a matching string. 

This function tries to locate an element which is an indicator. Note that this indicator is 
NOT at the head of the pattern otherwise it would have gone to LocatePattern Head 
instead Because of this, if our pattern is to be satisfied, the next word we grab HAS 
to be the indicator or else we would have failed Thus we only grab one word, test to 
see if it is a valid indicator and then return result. 

This function reinitializes our temporary test structure because we have already 
transferred the info to the permanent structure, we can reinitialize it so they each 
have one element 

This function is only called when we know that the information stored in 
tlnCurrGuesses is valid meaning that it represents legitimate guesses of meeting fields 
ready to be stored in the permanent Match) record, tlnMeetingRecord. We check to make 
sure that we do not store duplicates and we also what to clean up what we want to 
store so that there's no dutter such as punctuation, etc. The reason why we don't 
clean up until now is to save time. We don't waste resources calling 
Parse AndClcanPhrase until we know for sure that wc are going to add it permanently. 
This function loops through each element in the array to make sure that the test string 
aString is not the same as any of the strings already stored in the array. Slightly different 
from OontainlnArray. 

This function prepares a query to be submitted to AltaVista Search engine. It submits it 
and then parses the returning result in the appropriate format containing the title, URL 
and body/summary of each story retrieved. The number of stories retrieved is specified 
by the constant NUM_AV_ STORIES. Important variables include stURLAlta Vista 
used to store query to submit stResultHTMLused to store html from page specified by 
stURLAlta Vista. 

This function constructs the URL string for the alta vista search engine using the 
advanced query search mode. It includes the keywords to be used, the language and 
how we want to rank the search. Depending on whether we want to use the results of 
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Procedure 
Name 



Type 



Called By 



Description 



ConstructSimple 

KeyWord 

(BRSearch) 

ConstructCompIex- 

AVKcyWord 

(BRSearch) 



Join WithCo nnectors 
(BRSearch) 



RefineWithDate 

(NOT CALLED 

AT THE 

MOMENT) 

(BRSearch) 

RefineWithRank 

(BRSearch) 



IdentifyBlock 
(BRParse) 



IsOpenURLError 
(BRError) 

SearchNews 
Page 

(BP.Search) 

ConstructNews- 

PageURL 

(BRSearch) 

ConstructComplex- 

NPKeyWord 

(BRSearch) 



ConstructOverall - 

Result 

(BF.Main) 



ConnectAnd 
TransferTo 
Munin 
(BRMain) 

DisconnectFrom- 

MuninAndQuit 

(BRMain) 



Private 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Public 
Function 



Public 
Function 



Public 
Function 



Private 
Function 



Private 
Function 



Private 
Function 



Public 
Sub 



Public 
Sub 



ConstmctAltaVisUURL 
ConstructNewsPageURL 

ConstructAltaVistaURL 



Const ructComplex- 

AVKeyWord, 

Const ructComp lex- 

NPKeyWord, 

RefineWithRank 

ConstructAltaVistaURL 



our pattern matching unit, we construct our query differently. 

This function marches down the list of keywords stored in the stTitleKW or stBodyKW 
fields of the input meeting record and links them up into one string with each keyword 
separated by a connector as determined by the input variable stlnConncctor. 
Returns this newly constructed string. 

This function constructs the keywords to be send to the AltaVista site. Unlike 
Cons tracts imple Key Word which simply takes all the keywords from the title to form 
the query, this function will look at the results of BFs pattern matching process and 
see if we are able to identify any specific company names or topics for constructing 
the queries. Query will include company and topic identified and default to simple 
query if we cannot identify either company or topic. 

This function simply replaces the spaces between the words within the string with a 
connector which is specified by the input. 



This function constructs the date portion of the alta vista query and returns this portion of 
the URL as a string. It makes sure that alta vista searches for articles within the past 
PAST_NDAYS. 



SearchAlta Vista, 
SearchNewsPage 



SearchAlta Vista, 
SearchNewsPage 

GoBackGroundFinder 



SearchNewsPage 



ConstructAltaVistaURL This function constructs the string needed to passed to Altavista in order to rank an 

advanced query search. If we are constructing the simple query we will take in all the 
keywords from the title. For the complex query, we will take in words from company 
and topic, much the same way we formed the query in ConstructComplexAVKeyWord 
This function extracts the block within a string marked by the beginning and the ending 
tag given as inputs starting at a certain location (iStart). The block retrieved does 
not include the tags themselves. If the block cannot be identified with the specified 
delimiters, we return unsuccessful through the parameter iReturnSuccess passed to use 
by reference. The return type is the the block retrieved. 

This function determines whether the error encountered is that of a timeout error. It 
restores the mouse to default arrow and then returns true if it is a time out or false 
otherwise. 

This function prepares a query to be submitted to NewsPage Search engine. It submits it 
and then parses the returning result in the appropriate format containing the title, 
URL and body/summary of each story retrieved. The number of stories retrieved is 
specified by the constant UM_J4P_STORIES 

This function constructs the URL to send to the NewsPage site. It uses the information 
contained in the input meeting record to determine what keywords to use. Also 
depending whether we want simple or complex query, we call different functions to 
form strings. 

ConstructNewsPageURL This function constructs the keywords to be send to the NewsPage site. Unlike 

ConstructKeyWordString which simply takes all the keywords from the title to form 
the query, this function will look at the results of BF's pattern matching process 
and see if we are able to identify any specific company names or topics 
for constructing the queries. Since newspage works best when we have a company 
name, we HI use only the company name and only if there is no company will 
we use topic. 

This function takes in as input an array of strings (stlnStories) and a MeetingReoord 
which stores the information for the current meeting. Each element in the array stores 
the stories retrieved from each information source. The function simply constructs the 
appropriate output to send to Munin including a return message type to let Munin know 
that it is the BF responding and also the original user_id and meeting title so 
Munin knows which meeting BF is talking about. 

This function allows Background Finder to connect to Munin and eventually transport 
information to Munin. We will be using the UDP protocol instead of the TCP protocol so 
we have to set up the remote host and port correctly. We use a global string to store 
gResult Overall because although it is unnecessary with UDP, it is needed with TCP and 
if we ever switch back don't want to change code. 



GoBackGroundFinder 



GoB ackGroundFinder 



FIG. 6 is a flowchart of the actual code utilized to prepare 
and submit searches to the Alta Vista and Newspage search 
engines in accordance with a preferred embodiment. Pro- 
cessing commences at function block 610 where a command 
line is utilized to update a calendar entry with specific 
calendar information. The message is next posted in accor- 
dance with function block 620 and a meeting record is 
created to store the current meeting information in accor- 
dance with function block 630. Then, in function block 640 
the query is submitted to the Alta Vista search engine and in 



function block 650, the query is submitted to the Newspage 
search engine. When a message is returned from the search 
engine, it is stored in a results data structure as shown in 
function block 660 and the information is processed and 
stored in summary form in a file for use in preparation for 
the meeting as detailed in function block 670. 

FIG. 7 provides more detail on creating the query in 
accordance with a preferred embodiment. Processing com- 
mences at function block 710 where the meeting record is 
parsed to obtain potential companies, people, topics, loca- 



60 



65 



04/14/2003, EAST Version: 1.03.0007 



US 6,356,905 Bl 

27 28 

tion and a time. Then, in function block 720, at least one held device 920 utilizes a wireless modem such as a Rico- 
topic is identified and in function block 720, at least one chet SE Wireless Modem from Metricom. Utilizing this 
company name is identified and finally in function block device, a user can hang out in a coffee shop with a portable 
740, a decision is made on what material to transmit to the computer perched on a rickety little table, with a latte 
file for ultimate consumption by the user. 5 sloshing dangerously close to the keyboard, and access the 
FIG. 8 is a variation on the query theme presented in FIG. Internet at speeds rivaling direct connect via a telephone 
7. A meeting record is parsed in function block 800, a UQC * 

company is identified in function block 820, a topic is The 8-ounce Ricochet SE Wireless Modem is about as 

identified in function block 830 and finally in function block large as a pack of cigarettes and setup is extremely simple, 

840 the topic and or the company is utilized in formulating 10 simply attach the modem to the back of your portable's 

the query. screen with the included piece of Velcro, plug the cable into 

Alternative embodiments for adding various specific fea- mc ^ia\ port, flip up the stubby antenna, and transmit, 

tures for specific user requirements are discussed below. Software setup is equally easy: a straightforward installer 

adds the Ricochet modem drivers and places the connection 

Enhance Target Rate for Pattern Matching is icon on your desktop. The functional aspects of the modem 

To increase BF's performance, more patterns/pattern are identical to raat of a traditional telephone modem, 
groups are added to the procedure "CreatePatteras." The Of course, wireless performance isn't nearly as reliable as 
existing code for declaring patterns can be used as a template a traditional dial-up phone connection. We were able to get 
for future patterns. Because everything is stored as dynamic 2Q strong connections in several San Francisco locations as 
arrays, it is convenient to reuse code by cutting and pasting. Ion S 35 we stayed near the windows. But inside CNET's 
The functions BindName, BindTime, BindCompanyLoc- all-brick headquarters, the Ricochet couldn't connect at all. 
Topic which are responsible for associating a value with a When you do get online, performance of up to 28.8 kbps is 
placeholder can be enhanced. The enhancement is realized available with graceful degradation to slower speeds. But 
by increasing the set of criteria for binding a certain meeting even me slower speeds didn't disappoint. Compared to the 
field in order to increase the number of binding values. For altemative^connecting via a cellular modem— the Rico- 
example, BindTime currently accepts and binds all values in cnet 15 muCD faster, more reliable, and less expensive to use. 
the form of ##:## or #:##. To increase the times we can bind, Naturally, the SE Wireless is battery powered. The modem 
we may want BindTime to also accept the numbers 1 to 12 nas continuous battery life of up to 12 hours. And in 
followed by the more aesthetic time terminology "o'clock," 30 accordance with a preferred embodiment, we ran down our 
Vocabulary based recognition algorithms and assigning an portable computer's dual cells before the Ricochet started to 
accuracy rate to each guess BF makes allowing only guesses &de. 

which meet a certain threshold to be valid. Thus, utilizing the wireless modem, a user may utilize the 

Depending on what location the system identifies through web xtvcI software 940 to identify the right product 950 

pattern matching or alternatively depending on what loca- 35 anc3 then use an appropriate device's key(s) to select a 

tion the user indicates as the meeting place, a system in supplier and place an order in accordance with a preferred 

accordance with a preferred embodiment suggests a plurality embodiment. The BargainFinder Service Module then coo- 

of fine restaurants whenever it detects the words lunch/ summates the order with the appropriate third-party Web 

dinner/breakfast. We can also use a site like company finder supplier 960. 

to confirm what we got is indeed a company name or if there 40 

is no company name that pattern matching can identify, we m y Sltc! PcrsoDal Web Sltc & Intentions Value 

can use a company finder web site as a "dictionary" for us Network Prototype 

to determine whether certain capitalized words represent a mySite! is a high-impact, Internet-based application in 

company name. We can even display stock prices and accordance with a preferred embodiment that is focused on 

breaking news for a company that we have identified. 45 the theme of delivering services and providing a personal- 

,, r . „ ... . . A , . . ized experience for each customer via a personal web site in 

Wireless Bargain Identification m Accordance with . < . ^ . 1 j 

nr jt-uj- * a buyer-centric world. The services are intuitively organized 

a Preferred Embodiment ' . 4 . - . 4 • * c j * 1 i-r 

around satisfying customer intentions — fundamental life 

FIG. 9 is a flow diagram that depicts the hardware and needs or objectives that require extensive planning 
logical flow of control for a device and a software system 50 decisions, and coordination across several dimensions, such 
designed to allow Web-based comparison shopping in as financial planning, healthcare, personal and professional 
conventional, physical, non-Web retail environments. A development, family life, and other concerns. Each member 
wireless phone or similar hand-held wireless device 920 owns and maintains his own profile, enabling him to create 
with Internet Protocol capability is combined with a minia- and browse content in the system targeted specifically at 
ture barcode reader 910 (installed either inside the phone or 55 him. From the time a demand for products or services is 
on a short cable) and used to scan the Universal Product entered, to the completion of payment, intelligent agents are 
Code (UPC) bar code on a book or other product 900. The utilized to conduct research, execute transactions and pro- 
wireless device 920 transmits the bar code via an antennae vide advice. By using advanced profiling and filtering, the 
930 to the Pocket BargainFinder Service Module (running intelligent agents learn about the user, improving the ser- 
on a Web server) 940, which converts it to (in the case of 60 vices they deliver. Customer intentions include Managing 
books) its International Standard Book Number or (in the Daily Logistics (e.g., email, calendar, contacts, to-do list, 
case of other products) whatever identifier is appropriate. bill payment, shopping, and travel planning); and Moving to 
The Service Module then contacts the appropriate third- a New Community (e.g., finding a place to live, moving 
party Web site(s) to find price, shipping and availability household possessions, getting travel and shipping insurance 
information on the product from various Web suppliers 950. 65 coverage, notifying business and personal contacts, learning 
This information is formatted and displayed on the hand- about the new community). From a consumer standpoint, 
held device's screen. The IP wireless phone or other hand mySite! provides a central location where a user can access 
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relevant products and services and accomplish daily tasks tion about the customers, such as name, address, social 
with ultimate ease and convenience. security number and credit card information, personal 
From a business standpoint, mySite! represents a value- preferences, behavioral information, history, and web site 
added and innovative way to effectively attract, service, and layout preferences. The Supplier's Web Server 1070 pro- 
retain customers. Intention value networks allow a user to 5 v ides access to all of the supplier's databases necessary to 
enter through a personalized site and, and with the assistance provide information and transactional support to the cus- 
of a learning, intelligent agent, seamlessly interact with tomer. 

network participants. An intention value network in accor- The Product Information Database 1080 stores all 

dance with a preferred embodiment provides superior value. product-related information, such as features, availability 

It provides twenty four hour a day, seven days a week access 10 and pricing. The Product Order Database 1090 stores all 

to customized information, advice and products. The infor- customer orders. The interface to this database may be 

mation is personalized so that each member views content through an Enterprise Resource Planning application offered 

that is highly customized to assure relevance to the required by SAP, Baan, Oracle or others, or it may be accessible 

target user. directly through the Supplier's Web Server or application 

15 server. The Customer Information Database 1091 stores all 

Egocentric Interface 0 f the customer information that the supplier needs to 

An Egocentric Interface is a user interface crafted to complete a transaction or maintain customer records, 

satisfy a particular user's needs, preferences and current FIG. 10B is a flowchart providing the logic utilized to 

context It utilizes the user's personal information that is create a web page within the Egocentric Interface. The 

stored in a central profile database to customize the inter- 20 environment assumes a web server and a web browser 

face. The user can set security permissions on and prefer- connected through a TCP/IP network, such as over the 

ences for interface elements and content. The content inte- public Internet or a private Intranet. Possible web servers 

grated into the Egocentric Interface is customized with could include Microsoft Internet Information Server, 

related information about the user. When displaying content, Netscape Enterprise Server or Apache. Possible web brows- 

the Egocentric Interface will include the relationship 25 ers include Microsoft Internet Explorer or Netscape Navi- 

between that content and the user in a way that demonstrates gator. The client (i.e. web browser) makes a request 1001 to 

how the content relates to the user. For instance, when the server (ix. web server) for a particular web page. This is 

displaying information about an upcoming ski dip the user usually accomplished by a user clicking on a button or a link 

has signed up for, the interface will include information within a web page. The web server gets the layout and 

about events from the user's personal calendar and contact content preferences 1002 for that particular user, with the 

list, such as other people who will be in the area during the request to the database keyed off of a unique user id stored 

ski trip. This serves to put the new piece of information into in the client (i.e. web browser) and the User profile database 

a context familiar to the individual user. 1003. The web server then retrieves the content 1004 for the 

FIG. 10A describes the Intention Value Network Archi- 35 P a 8 e that has been requested from the content database 

tecture implementation for the World Wide Web. For sim- 1005 - ^ relevant user-centric content, such as calendar, 

plification purposes, this diagram ignores the complexity emai1 ' contact list, and task list items are then retrieved 

pertaining to security, scalability and privacy. The customer 1006 ( Sce HG - n for a more detailed description of this 

can access the Intention Value Network with any Internet process.) The query to the database utilizes the user content 

web browser 1010, such as Netscape Navigator or Microsoft ^ preferences stored as part of the user profile in the User 

Internet Explorer, running on a personal computer connected profile database 1003 to filter the content that is returned, 

to the Internet or a Personal Digital Assistant with wireless ^ content that is returned is then formatted into a web 

capability. See FIG. 17 for a more detailed description of the P a S e 1007 according to the layout preferences defined in the 

multiple methods for accessing an Intention value Network. u f er P r o file - Tn e we b page is then returned to the client and 

The customer accesses the Intention Value Network through 45 displayed to the user 1008. 

the unique name or IP address associated with the Integra- FIG. 11 describes the process of retrieving user-centric 

tor's Web Server 1020. The Integrator creates the Intention content to add to a web page. This process describes 1006 in 

Value Network using a combination of resources, such as the FIG. 10B in a more detailed fashion. It assumes that the 

Intention Database 1030, the Content Database 1040, the server already has obtained the user profile and the existing 

Supplier Profile Database 1050, and the Customer Profile 50 content that is going to be integrated into this page. The 

Database 1060. server parses 1110 the filtered content, looking for instances 

The Intention Database 1030 stores all of the information of events, contact names and email addresses. If any of these 

about the structure of the intention and the types of products are found, they are tagged and stored in a temporary holding 

and services needed to fulfill the intention. Information in s P ace - Then, the server tries to find any user-centric content 

this database includes intention steps, areas of interest, 55 1120 stored m various databases. 

layout templates and personalization templates. The Content This involves matching the tagged items in the temporary 

Database 1040 stores all of the information related to the storage space with calendar items 1130 in the Calendar 

intention, such as advice, referral information, personalized Database 1140; email items 1115 in the Email Database 

content, satisfaction ratings, product ratings and progress 1114; contact items 1117 in the Contact Database 1168; task 

reports. 60 list items 1119 in the Task list Database 1118; and news 

The Supplier Profile Database 1050 contains information items 1121 in the News Database 1120. After retrieving any 

about the product and service providers integrated into the relevant user-centric content, it is compiled together and 

intention. The information contained in this database pro- returned 1122. 
vides a link between the intention framework and the 

suppliers. It includes product lists, features and descriptions, 65 ser crsona 

and addresses of the suppliers' product web sites. The The system allows the user to create a number of different 

Customer Profile Database 1060 contains personal informa- person as that aggregate profile information into sets that are 
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useful in different contexts. A user may create one persona 
when making purchases for his home. This persona may 
contain his home address and may indicate that this user is 
looking to find a good bargain when shopping. The same 
user may create a second persona that can be used when he 
is in a work context. This persona may store the user's work 
address and may indicate that the user prefers certain ven- 
dors or works for a certain company that has a discount 
program in place. When shopping for work-related items, 
the user may use this persona. A persona may also contain 
rules and restrictions. For instance, the work persona may 
restrict the user to making airline reservations with only one 
travel agent and utilizing booking rules set up by his 
employer. 

FIG. 12 describes the relationship between a user, his 
multiple person as and his multiple profiles. At the User 
Level is the User Profile 1200. This profile describes the user 
and his account information. There is one unique record in 
the database for each user who has an account. Attached to 
each user are multiple Personas 1220, 1230 & 1240. These 
Personas are used to group multiple profiles into useful 
contexts. For instance, consider a user who lives in San 
Francisco and works in Palo Alto, but has a mountain cabin 
in Lake Tahoe. He has three different contexts in which he 
might be accessing his site. One context is work-related. The 
other two are home-life related, but in different locations. 
The user can create a Persona for Work 1220, a Persona for 
Home 1230, and a Persona for his cabin home 1240. Each 
Persona references a different General Profile 1250, 1260 
and 1270 which contains the address for that location. 
Hence, there are three General Profiles. Each Persona also 
references one of two Travel Profiles. The user maintains a 
Work Travel Profile 1280 that contains all of the business 
rules related to booking tickets and making reservations. 
This Profile may specify, for instance, that this person only 
travels in Business or First Class and his preferred airline is 
United Airlines. The Work Persona references this Work 
Travel Profile. The user may also maintain a Home Travel 
Profile 1290 that specifies that he prefers to travel in coach 
and wants to find non-refundable fairs, since they are 
generally cheaper. Both the Persona for Home and the 
Persona for the cabin home point to the Home Travel Profile. 

FIG. 13 describes the data model that supports the Per- 
sona concept. The user table 1310 contains a record for each 
user who has an account in the system. This table contains 
a user name and a password 1320 as well as a unique 
identifier Each user can have multiple Personas 1330, which 
act as containers for more specialized structures called 
Profiles 1340. Profiles contain the detailed personal infor- 
mation in Profile Field 1350 records. Attached to each 
Profile are sets of Profile Restriction 1360 records. These 
each contain a Name 1370 and a Rule 1380, which define the 
restriction. Hie Rule is in the form of a pattern like (if x then 
y), which allows the Rule to be restricted to certain uses. An 
example Profile Restriction would be the rule that dictates 
that the user cannot book a flight on a certain airline 
contained in the list. This Profile Restriction could be 
contained in the "Travel" Profile of the "Work" Persona set 
up by the user's employer, for instance. Each Profile Field 
also contains a set of Permissions 1390 that are contained in 
that record. These permissions dictate who has what access 
rights to that particular Profile Field's information. 

Intention-Centric Interface 

Satisfying Customer Intentions, such as Planning for 
Retirement or Relocating requires a specialized interface. 
Customer Intentions require extensive planning and coordi- 
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nation across many areas, ranging from financial security, 
housing and transportation to healthcare, personal and pro- 
fessional development, and entertainment, among others. 
Satisfying Intentions requires a network of complementary 
5 businesses, working across industries, to help meet consum- 
ers* needs. 

An Intention-Centric Interface is a user interface designed 
to help the user manage personal Intentions. At any given 
point, the interface content is customized to show only 

10 content that relates to that particular Intention. The 
Intention-Centric Interface allows the user to manage the 
process of satisfying that particular Intention. This involves 
a series of discrete steps and a set of content areas the user 
can access. At any point, the user can also switch the 
interface to manage a different Intention, and this act will 

15 change the content of the interface to include only that 
content which is relevant to the satisfaction of the newly 
selected Intention. 

FIG. 14 provides a detailed description of the data model 
needed to support an Intention-Centric Interface. Each User 

20 Persona 1410 (see FIG. 13 for a more detailed description of 
the Persona data model.) has any number of active User 
Intentions 1420. Each active User Intention is given a 
Nickname 1430, which is the display name the user sees on 
the screen. Each active User Intention also contains a 

25 number of Data Fields 1440, which contain any user data 
collected throughout the interaction with the user. For 
instance, if the user had filled out a form on the screen and 
one of the fields was Social Security Number, the corre- 
sponding Data Field would contain Name="SSN" 1450, 

30 Value-"999-99-9999 M 1460. Each User Intention also keeps 
track of Intention Step 1470 completion status. The Comple- 
tion 1480 field indicates whether the user has completed the 
step. Every User Intention is a user-specific version of a 
Generic Intention 1490, which is the default model for that 
Intention for all users. The Generic Intention is customized 

35 through Custom Rules 1411 and 1412 that are attached to the 
sub-steps in the Intention. These Custom Rules are patterns 
describing how the system will customize the Intention for 
each individual user using the individual user's profile 
information. 

40 

Statistical Agent 

An agent keeps track of key statistics for each user. These 
statistics are used in a manner similar to the Tamagochi 
virtual reality pet toy to encourage certain behaviors from 

45 the user. The statistics that are recorded are frequency of 
login, frequency of rating of content such as news articles, 
and activity of agents, measured by the number of tasks 
which it performs in a certain period. This information is 
used by the system to emotionally appeal to the user to 

50 encourage certain behaviors. 

FIG. 15 describes the process for generating the page that 
displays the agent's current statistics. When the user 
requests the agent statistics page 1510 with the client 
browser, the server retrieves the users' statistics 1520 from 

55 the users' profile database 1530. The server then performs 
the mathematical calculations necessary to create a normal- 
ized set of statistics 1540. The server then retrieves the 
formulas 1550 from the content database 1560 that will be 
used to calculate the user-centric statistics. Graphs are then 

60 generated 1570 using the generic formulas and that user's 
statistics. These graphs are inserted into a template to create 
the statistics page 1580. This page is then returned to the 
user 1590. 

65 Personalized Product Report Service 

The system provide Consumer Report-like service that is 
customized for each user based on a user profile. The system 
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records and provides ratings from users about product 
quality and desirability on a number of dimensions. The 
difference between this system and traditional product qual- 
ity measurement services is that the ratings that come back 
to the users are personalized. This service works by finding 5 
the people who have the closest match to the user's profile 
and have previously rated the product being asked for. Using 
this algorithm will help to ensure that the product reports 
sent back to the user only contain statistics from people who 
are similar to that user. 10 

FIG. 16 describes the algorithm for determining the 
personalized product ratings for a user. When the user 
requests a product report 1610 for product X, the algorithm 
retrieves the profiles 1620 from the profile database 1630 
(which includes product ratings) of those users who have 15 
previously rated that product. Then the system retrieves the 
default thresholds 1640 for the profile matching algorithm 
from the content database 1650. It then maps all of the short 
list of users along several dimensions specified in the profile 
matching algorithm 1660. The top n (specified previously as 20 
a threshold variable) nearest neighbors are then determined 
and a test is performed to decide if they are within distance 
y (also specified previously as a threshold variable) of the 
user's profile in the set 1670 using the results from the 
profile matching algorithm. If they are not within the 25 
threshold, then the threshold variables are relaxed 1680, and 
the test is run again. This processing is repeated until the test 
returns true. The product ratings from the smaller set of n 
nearest neighbors are then used to determine a number of 
product statistics 1690 along several dimensions. Those 30 
statistics are inserted into a product report template 1695 and 
returned to the user 1697 as a product report. 

Personal Profile and Services Ubiquity 

This system provides one central storage place for a 35 
person's profile. This storage place is a server available 
through the public Internet, accessible by any device that is 
connected to the Internet and has appropriate access. 
Because of the ubiquitous accesibility of the profile, numer- 
ous access devices can be used to customize services for the 40 
user based on his profile. For example, a merchant's web site 
can use th is profile to provide personalized content to the 
user. A Personal Digital Assistant (PDA) with Internet 
access can synchronize the person's calendar, email, contact 
list, task list and notes on the PDA with the version stored 45 
in the Internet site. This enables the person to only have to 
maintain one version of this data in order to have it available 
whenever it is needed and in whatever formats it is needed. 

FIG. 17 presents the detailed logic associated with the 
many different methods for accessing this centrally stored 50 
profile. The profile database 1710 is the central storage place 
for the users' profile information. The profile gateway server 
1720 receives all requests for profile information, whether 
from the user himself or merchants trying to provide a 
service to the user. The profile gateway server is responsible 55 
for ensuring that information is only given out when the 
profile owner specifically grants permission. Any device that 
can access the public Internet 1730 over TCP/IP (a standard 
network communications protocol) is able to request infor- 
mation from the profile database via intelligent HTTP 60 
requests. Consumers will be able to gain access to services 
from devices such as their televisions 1740, mobile phones, 
Smart Cards, gas meters, water meters, kitchen appliances, 
security systems, desktop computers, laptops, pocket 
organizers, PDAs, and their vehicles, among others. 65 
Likewise, merchants 1750 will be able to access those 
profiles (given permission from the consumer who owns 
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each profile), and will be able to offer customized, person- 
alized services to consumers because of this. 

One possible use of the ubiquitous profile is for a hotel 
chain. A consumer can carry a Smart Card that holds a digital 
certificate uniquely identifying him. This Smart Card's 
digital certificate has been issued by the system and it 
recorded his profile information into the profile database. 
The consumer brings this card into a hotel chain and checks 
in. The hotel employee swipes the Smart Card and the 
consumer enters his Pin number, unlocking the digital 
certificate. The certificate is sent to the profile gateway 
server (using a secure transmission protocol) and is authen- 
ticated. The hotel is then given access to a certain part of the 
consumer's profile that he has previously specified. The 
hotel can then retrieve all of the consumer's billing infor- 
mation as well as preferences for hotel room, etc. The hotel 
can also access the consumer's movie and dining prefer- 
ences and offer customized menus for both of them. The 
hotel can offer to send an email to the consumer's spouse 
letting him/her know the person checked into the hotel and 
is safe. All transaction information can be uploaded to the 
consumer's profile after the hotel checks him in. This will 
allow partners of the hotel to utilize the information about 
the consumer that the hotel has gathered (again, given the 
consumer's permission). 

Intention Value Network 

In an Intention Value Network, the overall integrator 
system coordinates the delivery of products and services for 
a user. The integrator manages a network of approved 
suppliers providing products and services, both physical and 
virtual, to a user based on the user's preferences as reflected 
in the user's profile. The integrator manages the relationship 
between suppliers and consumers and coordinates the sup- 
pliers* fulfillment of consumers* intentions. It does this by 
providing the consumer with information about products and 
suppliers and offering objective advice, among other things. 

FIG. 18 discloses the detailed interaction between a 
consumer and the integrator involving one supplier. The user 
accesses a Web Browser 1810 and requests product and 
pricing information from the integrator. The request is sent 
from the user's browser to the integrator's Web/Application 
Server 1820. The user's preferences and personal informa- 
tion is obtained from an integrator's customer profile data- 
base 1830 and returned to the Web/Application server. The 
requested product information is extracted from the suppli- 
er's product database 1840 and customized for the particular 
customer. The Web/Application server updates the suppli- 
er's customer information database 1850 with the inquiry 
information about the customer. The product and pricing 
information is then formatted into a Web Page 1860 and 
returned to the customer's Web Browser. 

Summary Agent 

A suite of software agents running on the application and 
web servers are programmed to take care of repetitive or 
mundane tasks for the user. The agents work according to 
rules set up by the user and are only allowed to perform tasks 
explicitly defined by the user. The agents can take care of 
paying bills for the user, filtering content and emails, and 
providing a summary view of tasks and agent activity. The 
user interface for the agent can be modified to suit the 
particular user. 

FIG. 19 discloses the logic in accordance with a preferred 
embodiment processing by an agent to generate a verbal 
summary for the user. When the user requests the summary 
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page 1900, the server gets the user's agent preferences 1920, the user's life insurance needs is also highlighted at the 
such as agent type, rules and summary level from the user center of the display to assist the user in determining 
profile database 1930. The server gets the content 1940, such appropriate action. A button 2380 is provided to facilitate 
as emails, to do list items, news, and bills, from the content changing the policy and a set of buttons 2390 are provided 
database 1950. The agent parses all of this content, using the 5 to assist a user in selecting various views of the user's 
rules stored in the profile database, and summarizes the insurance requirements, 
content 1960. The content is formatted into a web page 1970 

according to a template. The text for the agent's speech is Event Backgrounder 

generated 1980, using the content from the content database A „ 4 ^ . , . c 

1990 and speech templates stored in the database. This 10 Evcnt B *ff ounde ' * * sh ° rt de f"P Uon of ao 

speech text is inserted into the web page 1995 and the page upcommg event that is sent to the user just before an event, 
is returned to the user 1997. ^ & The Event Backgrounder is constantly updated with the 

latest information related to this event. Pertinent information 
Trusted Third Party sucn ^ itinerary and logistics are included, and other useful 

information, such as people the user knows who might be in 
The above scenario requires the web site to maintain a 15 me same i ocat ion, are also included. The purpose of the 
guarantee of privacy of information according to a published Event Backgrounder is to provide the most up-to-date infor- 
policy. This system is the consumer's Trusted Third Party, malion a5out ao event) drawmg from a number of resources, 
actmg on his behalf in every case, ernng on the side of such as public web sites and the user's calendar and contact 
privacy of information, rather than on the side of stimulation fc stS) to a rj ow me ^ T t0 react optimally in a given situation, 
of commerce opportunities. The Trusted Third Party has a 20 

set of processes in place that guarantee certain complicity Vicinity Friend Finder 

with the stated policy. 

This software looks for opportunities to tell the user when 
"meCommerce" a friend, family member or acquaintance is or is going to be 

His word extends the word "eCommerce" to mean 25 k "j e «™ vfcniity » the T^s software scans the 
"personalized electronic commerce." FIG. 20 illustrates a * cd6nd Y for "P 00 ™^ eve f ■ » &™ uses a geo- 

display login in accordance with a preferred embodiment. graph ' c map '° a ^f t . e ^ nts ™* ,! h f 

THe display is implemented as a Microsoft Internet Explorer ^ ndar f events ° f P^ 1 ' who ar ? J** m h * 1 con act bst - 
application with an agent 2000 that guides a user through the " ** en mforms ^"f? °f matc ^ s ' *™ ^Uing the user 
c . . ..° ... ° A . . . ,30 that someone is scheduled to be near him at a particular time, 
process or interacting with the system to customize and r 

personalize various system components to gather informa- Information Overload 
tion and interact with the user's personal requirements. A 

user enters a username at 2010 and a password at 2020 and The term information overload is now relatively under- 

selects a button 2040 to initiate the login procedure. As the stood in both its definition as well as its implications and 

logo 2030 suggests, the system transforms electronic com- consequences. People have a finite amount of attention that 

merce into a personalized, so called "me" commerce. is available at any one time, but there is more and more 

FIG. 21 illustrates a managing daily logistics display in yWS for that attention every day. In short, too much 

accordance with a preferred embodiment. A user is greeted information and too little time are the primary factors 

by an animated agent 2100 with a personalized message ^ complicating the lives of most knowledge workers today. 

2190. The user can select from various activities based on The first attempts to dynamically deal with information 

requirements, including travel 2110, household chores 2120, overload were primarily focused on the intelligent filtering 

finances 2130 and marketplace activities 2140. Icons 2142 of information such that the quantity of information would 

for routine tasks such as e-mail, calendaring and document be lessened. Rather than simply removing random bits of 

preparation are also provided to facilitate rapid navigation 4S information, however, most of these approaches tried to be 

from one activity to another. Direct links 2146 are also intelligent about what information was ultimately presented 

provided to allow transfer of news and other items of to the user. This was accomplished by evaluating each 

interest Various profiles can be selected based on where the document based on the user's interests and discarding the 

user is located. For example, work, home or vacation. The less relevant ones. It follows, therefore, that the quality was 

profiles can be added 2170 as a user requires a new profile 50 also increased. 

for another location. Various items 2180 of personal infor- Filtering the information is only a first step in dealing with 

mation are collected from the user to support various information is this new age. Arguably, just as important as 

endeavors. Moreover, permissions 2150 are set for items th c quality of the document is having ready access to it. 

2180 to assure information is timely and current. Once you have entered a meeting, a document containing 

FIG. 22 illustrates a user main display in accordance with 55 critical information about the meeting subject delivered to 

a preferred embodiment. World 2200 and local news 2210 is your office is of little value. As the speed of business 

provided based on a user's preference. The user has also continues to increase fueled by the technologies of 

selected real estate 2230 as an item to provide direct interconnectedness, the ability to receive quality information 

information on the main display. Also, a different agent 2220 wherever and whenever you are becomes critical. This new 

is provided based on the user's preference. 60 approach is called intelligent information delivery and is 

FIG. 23 illustrates an agent interaction in accordance with heralding in a new information age. 

a preferred embodiment. The agent 2310 is communicating A preferred embodiment demonstrates the intelligent 

information 2300 to a user indicating that the user's life information delivery theory described above in an attempt to 

insurance needs have changed and pointing the user to the not only reduce information overload, but to deliver high 

chart that best summarizes the information for the user. 65 quality information where and when users'require it. In 

Particular tips 2395 are provided to facilitate more detailed other words, the system delivers right information to the 

information based on current user statistics. A chart 2370 of right person at the right time and the right place. 
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Active Knowledge Management System Description 

FIG. 24 is a block diagram of an active knowledge 
management system in accordance with a preferred embodi- 
ment. The system consists of the following parts: back-end 
2400 connection to one or more servers, personal mobile 5 
wireless clients (Awareness Machine)2430, 2436, public 
clients (Magic Wall) 2410, 2420, web clients 2446, 2448, 
e-mail clients 2450, 2460. 
Back-end Server (2400) Processes 

FIG. 25 is a block diagram of a back end server in 10 
accordance with a preferred embodiment. The back-end 
(2400 of FIG. 24) is a computer system that has the 
following software active: Intelligent Agents Coordinator 
(Munin) 2580, Information Prioritization Subsystem 2530, a 
set of continuously and periodically running information 15 
gathering and processing Intelligent Agents 2500, 2502 and 
2504, User Profiles Database 2542 and supporting software, 
Information Channels Database 2542 and supporting 
software, communications software 2550, information trans- 
formation software 2560, and auxiliary software. 20 
The Awareness Machine (2446 & 2448 of FIG. 24) 

The Awareness Machine is a combination of hardware 
device and software application. The hardware consists of 
handheld personal computer and wireless communications 
device. Hie Awareness Machine reflects a constantly 25 
updated state-of-the-owner's-world by continually receiving 
a wireless trickle of information. This information, mined 
and processed by a suite of intelligent agents, consists of 
mail messages, news that meets each user's preferences, 
schedule updates, background information on upcoming 30 
meetings and events, as well as weather and traffic. 

The Intelligent Agent Coordinator 2580 of FIG. 25 is also 
the user's "interface" to the system, in that whenever the 
user interacts with the system, regardless of the GUI or other 
end-user interface, they are ultimately dealing with (asking 35 
questions of or sending commands to) the Intelligent Agent 
Coordinator. The Intelligent Agent Coordinator has four 
primary responsibilities: 1) monitoring user activities, 2) 
handling, information requests, 3) maintaining each user's 
profile, and 4) routine, information to and from users and to 40 
and from the other respective agents. 
Monitoring User Activities 

Anytime a user triggers a sensor the Intelligent Agent 
Coordinator receives an "environmental cue." These cues 
not only enable the Intelligent Agent Coordinator to gain an 45 
understanding where users' are for information delivery 
purposes, but also to leam the standard patterns (arrival 
time, departure time, etc.) of each persons' life. These 
patterns are constantly being updated and refined in an 
attempt to increase the system's intelligence when deliver- 50 
ing information. For instance, today it is not uncommon for 
a person to have several email accounts (work-based, home- 
based, mobile-based, etc.) as well as several different com- 
puters involved in the retrieval process for all of these 
accounts. Thus, for the Intelligent Agent Coordinator to be 55 
successful in delivering information to the correct location it 
must take into account all of these accounts and the times 
that the user is likely to be accessing them in order to 
maximize the probability that the user will see the informa- 
tion. This will be discussed further in another section. 60 
Handling Information Requests 

The Intelligent Agent Coordinator handles information 
requests from other agents in order to personalize informa- 
tion intended for each user and to more accurately reflect 
each user's interests in the information they are given. These 65 
requests will commonly be related to the user's profile. For 
instance, if an agent was preparing a traffic report for a user 
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it may request the traffic region (search string) of that user 
from the Intelligent Agent Coordinator. All access to the 
user's profile data is accessed in this method. 
Maintaining User Profiles 

User profiles contain extensive information about the 
users. This information is a blend of user-specified data and 
information that the Intelligent Agent Coordinator has 
learned and extrapolated from each user's information and 
activities. In order to protect the data contained in the 
profiles, the Intelligent Agent Coordinator must handle all 
user information requests. The Intelligent Agent Coordinator 
is constantly modifying and updating these profiles by 
watching the user's activities and attempting to learn the 
patterns of their lives in order to assist in the more routine, 
mundane tasks. The Intelligent Agent Coordinator also 
employs other agents to glean meaning from each user's 
daily activities. These agents mine this data trying to dis- 
cover indications of current interests, long-term interests, as 
well as time delivery preferences for each type of informa- 
tion. Another important aspect of the Intelligent Agent 
Coordinator's observations is that it also tries to determine 
where each user is physically located throughout the day for 
routing purposes. 
Information Routing 

Most people are mobile throughout their day. The Intel- 
ligent Agent Coordinator tries to be sensitive to this fact by 
attempting to determine, both by observation (unsupervised 
learning) and from cues from the environment, where users 
are or are likely to be located. This is certainly important for 
determining where to send the user's information, but also 
for determining in which format to send the information. For 
instance, if a user were at her desk and using the web client, 
the Intelligent Agent Coordinator would be receiving indi- 
cations of activity from her PC and would know to send any 
necessary information there. In addition, because desktop 
PCs are generally quite powerful, a full-featured, graphically 
intense version could be sent. However, consider an alter- 
native situation: the Intelligent Agent Coordinator has 
received an indication (via the keycard reader next to the 
exit) that you have just left the building. Minutes later the 
Intelligent Agent Coordinator also receives notification that 
you have received an urgent message. The Intelligent Agent 
Coordinator, knowing that you have left the building and 
having not received any other indications, assumes that you 
are reachable via your handheld device (for which it also 
knows the capabilities) and sends the text of the urgent 
message there, rather than a more graphically-oriented ver- 
sion. 

Inherent Innovations 

The Active Knowledge Management system represents 
some of the most advanced thinking in the world of knowl- 
edge management and human computer interaction. Some of 
the primary innovations include the following: 

The Intelligent Agent Coordinator as illustrated above. 

The development, demonstration, and realization of the 
theory of Intelligent Information Delivery 

Support for several channels of information delivery, all 
of which utilize a common back-end. For instance, if a 
user is in front of a Magic Wall the information will be 
presented in a multimedia-rich form. If the system 
determines that the user is mobile, the information will 
be sent by to their Awareness Machine in standard text. 
It facilitates delivery of information whenever and 
wherever a user requires the information. 

Personalization of information based not only on a static 
user profile, but also by taking into account history of 
the user interactions and current real-time situation 
including "who, where, and when" awareness. 
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Utilization of fast and scalable Information Prioritization 
Subsystem that takes into account Intelligent Agents 
Coordinator opinion, user preferences, and history of 
user interactions. It takes the load of mundane deci- 
sions off the Intelligent Agents part therefore allowing 5 
the agents to be much more sophisticated and precise 
without compromising the system scalability. 

Speech recognition and speech synthesis in combination 
with intelligent agent animated representation and tac- 
tile input provides for efficient, intuitive, and emotion- 10 
ally rewarding interaction with the system. 



Client Reporting Subsystem Model 
Context is 

The Reporting subsystem is used by other subsystems on 
the client to report (read: make a matter of record) various 
data. The subsystem makes no assumptions about the type of 
data it handles — the data could be fault reports (as part of an 
architectural service) or lead management information (as 20 
part of an application data service). The Reporting sub- 
system is in this sense part of the infrastructure, it is an 
underlying set of services available everywhere on the 
client. The Reporting subsystem uses the Communications 
subsystem to store and forward data. 25 
Architecture Overview 

The Reporting subsystem offers services to every client 
subsystem. It comprises a mechanism for messaging within 
the client application and between the client and the serv- 3Q 
er.The Reporting mechanism uses the Communications sub- 
system to store and forward data. The Exceptions use this 
Reporting mechanism for reporting information about errors 
only. 

Role 35 

The Reporting subsystem provides a set of infrastructural 
services which allow architectural components to report 
information. The subsystem makes no stipulation about the 
information reported, although it does contain components 
that map to certain types of reported information, such as 40 
system faults or customer interaction information. Part of 
the subsystem interface is presented as a set of Exceptions, 
which allow the automatic reporting of error conditions 
encountered during processing. 

The subsystem accepts data and forwards it in an appro- 45 
priate format to the Communications subsystem. It is cap- 
tures and reports Exceptions generated during processing 
that are the result of error conditions. It is able to deal with 
any type of report that needs to be made, from error logging 
to sales leads. It is flexible enough to record new types of 50 
information as required. It is also flexible enough to be able 
to add new types of report as required. In addition, it is able 
to deal with the non-availability of certain information 
during the logging process. 

Responsibilities 55 

The Client Reporting Subsystem is responsible for pro- 
viding a set of Exceptions for use by all parts of the 
application. The Subsystem is also responsible for logging 
fault reports, user interaction reports, application heartbeat 6Q 
reports, message receipts, referrals or leads, and Manage- 
ment Information System entries. 
Exclusions 

The subsystem is not responsible for gathering informa- 
tion from interface interactions or elsewhere; neither is it 65 
responsible for deciding what of a set of data needs to be 
reported. Reporting does not include the printing of reports. 



Component Specifications 



Client Exception This is a set of Exception classes which, using 
Reporting the Client Reporting Component reporting 

Component services, stores and sends fault information 

Client Reporting This is the mechanism by which information for 
Component all reports are collected, formatted, and 

submitted to the Client Communications 

Subsystem. 



Creation, Existence, and Management 

The key element of the subsystem is a static class which 
manages the creation of report objects — a report factory. 
This class is instantiated by the Communications subsystem 
(which manages client configuration) and is always avail- 
able. It is a severe error if it is not. Reports are generated on 
an ad hoc basis as needed. 
Sizing and Capacity 

Whatever the requirements of the client architecture, the 
'throughput* of the Client Reporting subsystem (understood 
as the number of reports, of every type, that are requested in 
a given time) will not place significant strain on system 
resources. Most of the capacity requirements for reports are 
absorbed by the Communications subsystem, which must 
arrange for the storage and transmission of those reports. 
Nevertheless, the subsystem must be able to deal with 
whatever throughput is demanded by the architecture, and 
the design takes into consideration the estimated workload 
generated by each part of the architecture. 
Performance 

Performance is not critical for the Client Reporting sub- 
system. Reported data, with a few exceptions, is stored 
before transmission, and so a delay before data is sent is 
anticipated. Certain types of severe or critical faults need to 
be reported at once, but the low bandwidth required for these 
transmissions will not present performance problems. 
Design Guidelines 

The reporting needs of the architecture fluctuate, although 
a core set of capabilities (fault reporting, lead management, 
interaction reporting) always remain requirements. The sub- 
system is flexible enough not only to extend or reduce its 
capabilities, but also to adjust the level of detail and the 
nature of data it records for each capability. Implementation 
of the system follows the project Java coding standards. 
Logical Components 

Exceptions within the Client Exception Reporting com- 
ponent call the Client Reporting component to create fault 
reports. 



Component Descriptions 

Client Exception This is a hierarchy of Exception classes 
Component structured to assist in the handling and passing 

of Exceptions. These Exceptions will accept 
information about the Exception event they 
represent With this information and whatever 
else the class knows about it's own event, the 
component will use the Gient Reporting 
Component to create fault reports. 
Client Reporting A static report factory will accept requests from 
Component other components in the form of a signalled 

event Based on this event, the factory will 
manufacture a report of a certain type. The 
report will then be populated with information 
supplied by the calling component or reporter. 
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Submitting a Report 

Component A signals the Client Reporting subsystem to 
indicate that a reportable event has occurred. The Client 
Reporting subsystem then requests information about the 
event and creates a report. Finally, the Client Reporting 
subsystem signals to the Client Communication subsystem 
that the report is ready to be sent. 
Throwing an Exception 

First, Component A signals that an exceptional event has 
occurred by instantiating an appropriate Exception. Second, 
Component A passes reportable event relevant information 
to the Exception. Third, Component A requests for the 
Exception to be thrown. Fourth, the Client Exception 
Reporting component submits a report to be sent based on 
the information available using the Client Reporting com- 
ponent as outlined above. Finally, the Client Exception 
Reporting component throws it's Exception. 

Local Content Subsystem Model 

Context 

The Local Content subsystem provides all content 
required by the application. This includes both static and 
dynamic content. It also provides business services required 
by the application. It operates on a "storage and retrieval" 
basis, storing the data obtained from the user and the 
business, and providing mechanisms to retrieve that data. 
The Local Content subsystem is used by the ISF subsystem 
to provide content. It uses the Communication subsystem to 
receive business data. 
Architecture Overview 

Objects in the Local Content subsystem are created by the 
Initialization subsystem of the Application Architecture. 
Services provided by the Local Content subsystem are also 
accessed through the Application Architecture, via the Ini- 
tialization and ISF subsystems. 

The example components of the Local Content subsystem 
reflect different types of business knowledge and processes 
required by the application. User Data and Business Data 
involve data collected respectively from the user and the 
business. The Calculation component performs complex 
calculations, and the Product component represents the 
products used by the business. The Content Providers com- 
ponent defines static media content. 
Role 

The Local Content subsystem provides static and dynamic 
content to the application. It also provides all business- 
specific services required by the application. 
Responsibilities 

The Local Content subsystem provides static and dynamic 
media to the application. The Local Content subsystem also 
stores store user entered details, business data, and performs 
business calculations. 
Exclusions 

All application-specific behaviour is provided through the 
Application Personality. This behaviour is defined by Ham- 
let scripts, which also define navigation between scripts. The 
scripts are divided into metaphors, each of which is an 
embodiment of a style of interaction. 

Access to media is provided through the Content Provid- 
ers component, which belongs to this subsystem. However, 
the objects of this component are created automatically by 
the System Initialiser component from a contents file defined 
in the Application Personality. 
Creation, Existence, and Management 

The Business Data component is created and initialized at 
the time of System Initialization. It exist for the life of the 
system. 

The User Data component is available for the entire time 
a customer is using the system. When a customer session 
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ends, references to all objects in the User Data component 
are released so that the objects can be garbage collected. 

Objects in the Local Content subsystem (with the excep- 
tion of the Business Data Component) are created and 
initialized either internally or by the Initialization sub- 
system. References to these objects are managed by the ISF 
subsystem. Business Data objects are created, initialized and 
managed by the Communications subsystem. 
Logical Components 

The ISF does not actually know about the Local Content 
subsystem. The Local Content subsystem implements a set 
of interfaces defined by the ISF. The Initialization subsystem 
uses the scripts defined in the Application Personality to 
define which objects (implementing those interfaces) need to 
be used to retrieve content. The ISF uses the Initialization 
component to create those objects, then manages them. 

Developers of the Application Personality can easily view 
their scripts as directly managing the local content objects. 
This allows the local content objects to be developed with- 
out knowledge of the Application Architecture layer. 



Component Descriptions 
Interface Support Framework Subsystem Model 



User Data 


Stores and retrieves data entered by the user, and 




initiates calculations on stored data. 


Business Data 


Stores and retrieves data provided by the 




business. 


Calculation 


Performs complex calculations. 


Product 


Provides access to information associated with 




particular marketing products. 


Content 


Provides access to static media. This is an 


Providers 


implementation of an interface and is not 




documented as a separate component. 



Context 

The Interface Support Framework subsystem is part of the 
Application Architecture Layer. It provides a rich interactive 
environment which exploits the full potential of a dynamic, 
multi-media interface. The ISF is built around a theatrical 
metaphor where every object is expected to exert dynamic 
behavior. 

Objects within the ISF are initialized by the Application 
Initialization subsystem within the Application Architecture 
Layer and utilize the services of the Content Players, Print- 
ing and Reporting subsystems in the Technical Architecture 
Layer. 

Architecture Overview 

Objects within the ISF subsystem are initialized by the 
Application Initialization Subsystem. Reporting and Trans- 
action Interface Services are used to log ISF data for the 
Technical Architecture layer to report or print. The Content 
Player subsystem within the Technical Architecture is used 
by the ISF to present media to the user. 

The ISF subsystem is built upon a layered architecture 
which follows the Model- View-Controller pattern. The Fac- 
tual component contains the object model of the business 
and definitions of business media content. The Visual com- 
ponent displays and manipulates media to provide a view of 
the business model to the user. The Behavioral component 
controls all interactions between the Visual and Factual 
components. 
Role 

The ISF subsystem provides the services for the applica- 
tion to present multimedia content in a controlled way. It 
also provides the capability to react to user input and affect 
changes to the scene. 
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Responsibilities 

The ISF subsystem displays each scene of the application, 
and modifies the content of a scene while it is displayed. In 
addition, the ISF subsystem enables navigation between 
scenes, reacts to user interaction, retrieves business content, 5 
performs business functions and calculations, and provides 
common user interface constructs. Other responsibilities of 
the ISF include initiating print jobs and video conference 
sessions, reporting on user entry into a scene, duration of a 1Q 
session, user interaction with a role, user navigation, and 
reporting on errors occurring within the ISF. Finally, the ISF 
manages user sessions, and responds to system level events 
such as system start up and shutdown or screen paint 
requests. 15 
Design Guidelines 

The ISF subsystem provides the services for the applica- 
tion to present multimedia content. It is architecturally 
layered into three distinct components which parallel the 
Model -View-Controller paradigm. This separates the core 20 
business objects and their data (the Model), from the visual 
representation of this information (the View), from the logic 
to control and react to changes in the Model or the Mew (the 
Controller). The architecture provides boundaries between 
the graphical style of the system (Stage, Roles and Scenes), 25 
the operational code (Actors, Scene Director and Stage 
Manager), and the underlying Content Providers (Business 
Objects). These sections are the Visual, the Behavioral, and 
the Factual components. 

The Factual Layer is not aware of the Visual layer. This 30 
allows the visual metaphor to change, without disrupting the 
underlying business domain model. The Behavioral level 
mediates between the Factual and visual layers and should 
avoid very complex interactions with either layer. Where 
possible, anonymous communications via a Publish/ 35 
Subscribe pattern is used to avoid further interdependencies 
between the layers. 

The Stage is identified as the display context. It is able to 
communicate only with the Locations it controls. It is hidden ^ 
behind the StageManager, where all visual requests need to 
be managed. 

The ISF is a layered system. All roles in a scene form a 
series of visual siblings. These roles can, in fact, contain and 
encapsulate other roles. This allows, through recursion, any 45 
number of distinct processing layers. Each child only com- 
municates with its direct parent, surrendering control of 
communicating beyond to the parent. This containment 
relationship is possible in both the Visual and Behavioral 
layers. 50 

To assist in navigation, Scene Thumbnails are maintained. 
The user may touch on a Scene Thumbnail to return to a 
previously visited Scene. 



Component Descriptions 


Visual component 


user interaction and presentation of multimedia 




content 


Behavioral 


application behavior and multimedia content 


component 


retrieval 


Factual component 


provides multimedia content and business 




function services and calculations 
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Process Control 

This table describes the various key threads which 
execute within the ISF. 



Thread 



Purpose 



65 



AWT Sends windows messages (e.g., screen touches) to 

the ProcessController. This thread is created by 

the Java Virtual Machine. 
Main Initializes the application, then exits. This thread 

is created by the Java Virtual Machine. 
Processing Performs the actions initiated by the * 

Process Contro ller. 
Timer Generates and actions time based system events 

such as session timeouts. 
Video Status Receives notification of video finish events and 

dispatches them to the ProcessController. 
Audio Status Receives notification of audio finish events and 

dispatches them to the ProcessController. 

User Touches Location on Stage 
Description 

End users will touch the visible window of the 
application, the Stage. This will initiate a response from the 
application. 

Actors: End User Components Involved: Visual 
Key Objects Involved: Stage 
Stage processes User Touch 
Description 

The application window will determine which location is 
affected by the touched area, and will notify the correspond- 
ing Role of a touch. The stage will also control the visual cue 
displayed on the window. 
Actors: End user Components Involved: Visual 
Key Objects Involved: Timer, User Interaction Reporter, 
Location, Media Player 
Role Accepts User Touch Event 
Description 

Each role is notified of a user touch. This will force it to 
request a media change in its corresponding media, and, 
once accomplished, to notify its actor of the user interaction. 
Actors: Stage 

Components Involved: Visual, Behavioral 
Key Objects Involved: Actor, Role 
Actor activates Event Casting 
Description 

The actor will cycle through all of its registered Casting 
Lists and activate all castings which are interested in the 
specific event. Casting? behave polymorphically, and there- 
fore the behaviour of how to respond is actually held in the 
Casting, not the actor. 
Actors: Role, Execute Casting 
Components Involved: Behavioral 

Key Objects Involved: Actor, Casting, Stage Manager, 
Scene Director 

Stage Manager Performs Scene Transition 
Description 

The Stage Manager will replace the currently active scene 
with a new scene, based on the information in the Naviga- 
tion Casting. It is important to control how the change 
occurs, to preserve the visual illusion of the Kiosk World. 
This scenario is also invoked when starting, or restarting, the 
application. In this case, there is no current Scene, but the 
application is told to transition to the first Scene of the 
application. 

Actor: Navigation Casting, System Initializer 
Components Involved: Behavioral, Visual 
Key Objects Involved: Stage, Scene Director, Session Man- 
ager 

Scene Director performs Cast Change 
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Description 

The Scene Director coordinates the activation of all 
timings to ensure that any messages to the Stage Manager 
are grouped together and engaged at an appropriate time. 
This will ensure that all changes to the Roles visible on the 5 
scene will occur at the same time. 
Actors: Content Casting, Slide Casting 
Components Involved: Visual, Behavioral 
Key Objects Involved: Scene Director, Stage Manager 
Scene Director Performs Cast Change with Slide Effect 
Description 

The Scene Director must onstage all non-sliding Roles 
and then display the content of the Slide Casting along a 
straight line until its final destination. 
Actors: Slide Casting 

Components Involved: Visual, Behavioral 15 

Business Object Invokes Business Function 

Description 

The application may request business information from 
the Business Domain Model (e.g., return a Repayment 
Amount). This interface also supports putting values into the 20 
business objects (such as store a Loan Term Amount). Each 
business object is provided a generic interface to invoke 
behaviors. The desired behaviour is specified in the Business 
Function Casting, and it capable of returning information to 
the Actor associated with the Business Object. 25 
Actors: Business Function Casting 
Components Involved: Factual 
Stage Manager Times Outfrom Inactivity 
Description 

The application must support a time out facility, in the 30 
event that the user walks away from the application prior to 
returning to the Attractor Screen. This will protect the 
privacy of details entered by users. 
Actors: Wait Timer 

Components Involved: Behavioral 35 

Session Manager Resets Application 

Description 

During the execution of the application, it may be nec- 
essary to reset the Stage to the first scene, and clear out the 
session. This may be triggered by system inactivity, or by 40 
direct user request dispatched through a business object. 
Actors: Stage Manager's Wait Timer, Business Object 
Components Involved: Behavioral 
Reset Session Information 

Description 45 

Each session stores information gathered about the user. 
At the end of a session, or by user request, it is possible to 
erase all entered data. This supports privacy. 
Actors: Session Manager 

Components Involved: Behavioral 50 

Responsive Media Displays Media 

Description: 

The Visual component of the ISF is responsible for 
displaying all media (image, audio, video, text) to the Stage. 
It actually interfaces with the underlying media subsystem in 55 
the Client Technical Architecture. Each player is obtained 
through the Gatekeeper, and supports all code required to 
present the media to the user. 
Actors: Role 

Components Involved: Visual, Behavioral, Factual, Content 60 
Players 

Key Objects Involved: Responsive Media, Media Player, 
Stage 

Application Requests Hard Copy Printout 

Description 65 

The client application may request a print out of static 
information (check list), or dynamic information (product 



explanation including current interest rates and other 
dynamic components of the product, product simulation 
and/or line graph). This is initiated by the end user, through 
a Print Casting. 
Actors: Print Casting 

Components Involved: Behavioral, Printing 
Key Objects Involved: Business Object 

Reporting Interface Subsystem Model 

Context 

The Reporting Interface Subsystem collects information 
logged by the Application Architecture Layer and sends it to 
the Client Reporting Subsystem in the Technical Architec- 
ture Layer. 

Architecture Overview 

The Reporting Interface information is logged by com- 
ponents within the Application Architecture Layer and sent 
to the Client Reporting Subsystem in the Technical Archi- 
tecture Layer. 
Role 

The Reporting Interface subsystem provides services to 
log user-interaction with the kiosk and report on software 
and hardware faults which occur within the Application 
Architecture Layer. 
Responsibilities 

The Reporting Interface subsystem is responsible for 
gathering and logging user interaction with the kiosk by 
capturing what a user is doing with the system, e.g., which 
scenes they are visiting, which visual elements they are 
interacting with. The Reporting Interface subsystem also 
Gathers and logs information related to business products 
which the customer is interested in, the output of business 
functions which the customer has invoked and business data 
which the customer has input. Finally, the Reporting Inter- 
face subsystem captures information relating to the software 
and hardware performance of the kiosk. This information 
can then be used for error handling and fault management 
analysis. 
Exclusions 

The Reporting Interface subsystem does not include ser- 
vices to gather and log customer related information such as 
the a customer name and telephone numbers. 

Systems Management Subsystem Model 

Context 

Systems Management involves the definition of a com- 
bination of automated and manual procedures. Automation 
is achieved primarily through the use of Systems Manage- 
ment Server (SMS). SMS is a tool within the Microsoft 
backofEce suite of tools which can centrally manage system 
software and hardware in a distributed environment. 
Systems Management Subsystem Architecture 

The Systems Management Subsystem architecture con- 
sists of two components, the Systems Management Server 
(SMS), and Fault Monitoring. 
Systems Management Server (SMS) 

Systems Management Server (SMS) is a Microsoft tool 
that can be used to distribute software/content, take software 
audits, perform fault diagnosis and take remote control. 
SMS is supplemented by a component developed for the 
architecture, namely the File Transfer Utility. 
Fault Monitoring 

The Kiosk is monitored real-time through a Heartbeat 
message system. Heartbeat pulses are sent from the Kiosk at 
a configurable rate (say one every minute) and are monitored 
at a console running the Kiosk Monitoring Application. If 
the status of a Kiosk changes to indicate a fault, monitoring 
application will initiate the appropriate action. Some errors 
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will be handled through the existing Operations Center. The 
routing of these errors is covered in the Application Services 
Subsystem. 
Role 

SMS is used for medium sized software distribution (both 
code and content) and for fault diagnosis of the remote 
kiosks. There are a large number of features that make SMS 
a flexible and useful support facility. Fault monitoring will 
provide the means to view the real-time status of each kiosk 
and associated peripherals. When a problem occurs at a 
particular kiosk (such as running out of paper), the kiosk will 
be brought to the attention of a operations representative. It 
will be possible to observe the kiosk status to verify the 
resolution of the problem. 
Responsibilities 

The SMS is responsible for software distribution, hard- 
ware fault management and diagnosis, and client remote 
re-boots. The SMS also provides a user interface to selec- 
tively view the status of all kiosks in the network. 
Creation, Existence, and Management 

The SMS resides on a dedicated server in accordance with 
a preferred embodiment and is available at all times. 
Performance 

The elapsed time between a fault occurring on an MMT 
and subsequently being displayed to operations is dependent 
upon the frequency of the heartbeat, the ability for the 
application server to process the heartbeat and the frequency 
of refresh on the operations terminal. An example in accor- 
dance with a preferred embodiment is presented below. 





Elapsed 




Time 




(minutes) 


A fault occurs immediately after a 


1:00 


heartbeat message is sent. 




The Heartbeat message is sent to the 


0:01 


Application Server 




The Application Server receives and 


0:01 


processes the heartbeat message 




The operations Fault Monitoring 


2:00 


Application has just refreshed and is only 




refreshing once every 2 minutes. 




The Fault Monitoring Application 


0:05 


refreshes it's kiosk status main window 




The Fault Monitoring Application 


0:05 


refreshes it's kiosk status view (This time 




is for only one view window open. If 




there are more than one view windows 




open this time should be multiplied by 




the number of open view windows) 




Total 


3:11 



If a heartbeat message is not received from a kiosk within 
five minutes, (configurable) the Fault monitoring will set the 
kiosk status to Unknown. This time lag is required to avoid 
erroneously reporting MMT machines as unknown when the 
real problem lies in a slightly slower than normal processing 
of a heartbeat message. 
Logical Components 

An SMS site comprises of two components — a Primary 
Site and Clients. A primary site is the top most level in the 
SMS hierarchy. It contains its own SQL database to store 
system and inventory information for itself and other sec- 
ondary sites underneath it. Clients (kiosks) are administered 
from the primary site. The client sends its hardware/software 
information to SMS server through the SMS Inventory 
service. 
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Fault Monitoring 

After a configurable time interval the Client takes a status 
check of the Machine, Printer and the Application and sends 
it to the server in a heartbeat message. The server then places 
the status into a Kiosk Status Database that is monitored by 
operations staff for faults. 
Server Communications Subsystem Model 
Context 

The Server Communications subsystem is part of the 
Server Technical Architecture. The Server Communications 
subsystem handles all communications between clients, the 
central server, and the mainframe host. 
Architecture Overview 

The Asynchronous Messaging component provides the 
asynchronous message based communication between the 
client and the server, using standard Internet mail protocols, 
SMTP and POP3. 

The Business Process Access Module component pro- 
vides the common point to invoke predefined business 
functionality such recording interaction information from 
the MMT. HTTP is the protocol used to communicate with 
HTTP servers on the World Wide Web. It is used in the MMT 
to distribute small updates of application components and 
content during the client configuration process on start-up. 
Access to mainframe database resident data is done by 
replicating the required database tables to corresponding 
server resident database tables. The reverse process is used 
to centrally store the data accumulated on the server to the 
mainframe database tables. Generic alerts from the server 
are transmitted to the mainframe through an interface to the 
mainframe's front end processor. 
Role 

The role of the Server Communications subsystem is to 
Provide the server with communications facilities between 
the MMT (or Internet) client and the network server and 
between the network server and the mainframe systems. In 
addition, the Server Communication subsystem isolates and 
provides access to organization specific functionality. 
Responsibilities 

The Server Communication subsystem complies with 
standard Internet protocols, to allow ease of porting to that 
delivery channel. In addition, The Server Communication 
subsystem provides reliable asynchronous communication 
between the client and the application server, and controlled 
and reliable access to organization specific functionality. 
Finally, the Server Communication subsystem provides a 
facility to deliver updates to the configuration of the client, 
such as application components and content, and provides 
access to the organization's legacy systems through pre- 
defined processes, such as database replication and generic 
alert reporting. 
Exclusions 

Server Communication is confined to invoking modules 
which conform to the ACT messaging architecture. If com- 
munication to another platform is required, this must be 
located within the external systems module using the orga- 
nizations messaging or access methods. 
Transaction Interface Subsystem Model 
Architecture Overview 

Transactions are initiated by components within the 
Application Architecture Layer and sent to the Client 
Reporting Subsystem in the Technical Architecture Layer. 
The Transaction services identified in this document are not 
implemented as separate components in their own right, but 
are implemented as extensions to existing application Archi- 
tecture components. 
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Role 

The Transaction Interface subsystem is responsible for 
providing an interface to the application for storing of 
contact information about the end-user. These include, but 
are not limited to information required to complete a loan, 5 
survey -based information on customer demographics, 
account balance inquiry, and funds transfer. 
Customer Lead Transaction Execution 

The customer lead transaction execution application 
facilitates the Interface Support Framework and enables the 1Q 
services of the Reporting Subsystem of the Technical Archi- 
tecture to support the gathering and storing of information 
about the end-user. 
Exclusions 

The current usage of the Transaction Interface assumes 
that only asynchronous communications are available. 15 
Server Application Services Subsystem Model 
Architecture Overview 

The Application Services Sub-system includes a set of 
definitions for building the MMT server application and 
architecture modules. 20 
Role 

The Server Application Services sub -system includes the 
set of services and definitions for accessing application and 
architecture functionality. This subsystem defines the struc- 
ture and support services for building and executing appli- 2 s 
cations and modules on the Application Server or Operations 
workstation platform. The functionality supported includes 
transaction processing t to/from the MMT: such as customer 
referral information, customer interaction information, MIS 
information, fault information, product rates and prices 3Q 
information, application configuration information, message 
receipt information, and heartbeat status information. 
Responsibilities 

The Server Application Services sub-system processes 
application business logic on the server independently from 
the underlying database management system. The applies- 35 
tion business logic includes customer referral information, 
customer interaction information, MIS information, fault 
information, product rates and prices information, applica- 
tion configuration information, message receipt information, 
and heartbeat status information. 40 

The Server Application Services sub-system also provides 
a common service available to all server and client appli- 
cations to log an error, decode a given code (for example, 
*1'«NSW, *2*=QLD etc.), and retrieve configuration infor- 
mation from the registry located on each machine. 45 

Finally, the Server Applications Services sub-system 
invokes a Business Process (BP) for a given BP message. 
Logical Components 

The Server Application Services sub-system includes 
Common Servers, Data-Access Module, Business Process, 50 
and the Business Process Access Module. Definitions of 
each component are given in below. 
Component Descriptions 
Definitions 

Common Services. Common services to support the 55 
development of application functionality include decoding 
codes tables, retrieving configuration information from the 
registry, message handling, and the support for logging and 
handling server application errors. 

Data Access Module. A data access module (DAM) 60 
provides access to data within the application database. A 
DAM performs specific data access such as Insert, Delete. 
Update, Select, Select All across one or more tables. It is an 
MFC Extension DLL encapsulating a Recordset object 
which uses ODBC to access the underlying DBMS. The 65 
DAM definition outlines how these modules are used and 
coded. 
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Business Process. A business process (BP) is the appli- 
cation or architecture functionality that may be invoked by 
the Business Process Access Module architecture. A BP is 
identified by a message type. A BP accepts a request 
message defined by the BP and may provide a response 
message for synchronous messages. Database access is 
provided to the BP by DAMS. A BP is an MFC extension 
DLL with a defined entry point. 

Business Process Access Module. This component is 
detailed in the communications sub-system. The Business 
Process Access Module (BPAM) is the architecture compo- 
nent that provides access to business processes. The BPAM 
invokes a BP for a given message type. The BPAM accesses 
the message address table to lookup BP module details. This 
component is detailed in the communications sub-system. 

Wireless Electronic Valet in Accordance With A Preferred 
Embodiment 

One embodiment of the present invention is an a Mobile 
Portal Platform including a Mobile Portal and an Electronic 
Valet. The Electronic Valet is a hand held wireless computer 
device executing Thin Client Software. Integrated into the 
Electronic Valet are various sensors, such as GPS, Bio- 
sensors, and Environ-sensors. In addition, recording 
equipment, such as a camera and auido recorder, is also 
integrated into the Electronic Valet. The Mobile Portal 
includes a Mobile Portal Server which is connected to 
various third party content and service providers through the 
Internet or a Mobile Portal Extranet. 

FIG. 26 is a flow chart illustrating how the hardware and 
software of one embodiment of the present invention oper- 
ates. An Electronic Valet 2602 receives input data from 
sensors, GPS, camera, microphones, and other user inputs 
2600 integrated with the wireless hand held device. The Thin 
Client application executing on Electronic Valet 2602, as 
discussed in detail below, allows the Electronic Valet 2602 
to execute many different software applications without the 
need for a large amount of internal memory and storage 
capacity. The Electronic Valet 2602 forms a message based 
on the data received and the user input. The Electronic Valet 
2602 then transmits the message via antennae 2604 to the 
Mobile Portal 2606. The Mobile Portal 2606 parses the 
message received from the Electronic Valet 2602 and forms 
a new message based on the message received. The Mobile 
Portal 2606 then determines the appropriate third party 
service provider 2608 to transmit the new message to, based 
on the content of the message received from wireless hand 
held device 2602, and then transmits the new message. The 
third party service provider then performs the appropriate 
service and transmits the result back to the Mobile Portal 
2606. The Mobile Portal then forms a message based on the 
data received from the third party service provider 2608 and 
transmits the message back to the Electronic Valet 2602. The 
Electronic Valet 2602 then formats and displays the data 
received. The Electronic Valet 2602 utilizes a wireless 
modem such as a Ricochet SE Wireless Modem from 
Metricom. 

Of course, wireless performance isn't nearly as reliable as 
a traditional dial-up phone connection. We were able to get 
strong connections in several San Francisco locations as 
long as we stayed near the windows. But inside CNETs 
all-brick headquarters, the Ricochet couldn't connect at all. 
When you do get online, performance of up to 28.8 kbps is 
available with graceful degradation to slower speeds. But 
even the slower speeds didn't disappoint. Compared to the 
alternative — connecting via a cellular modem — the Rico- 
chet is much faster, more reliable, and less expensive to use. 
Naturally, the SE Wireless is battery powered. The modem 
has continuous battery life of up to 12 hours. 
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Thus, utilizing the wireless modem, a user may utilize the 
Mobile Portal 2606 via the Electronic Valet 2602. Using 
appropriate key(s), the user may select a service to use in 
concert with appropriate data obtained from sensors, GPS, 
camera, microphones, and other user inputs 2600. In certain 
circumstances, data may be automatically sent to select 
services based on the type and value of the data obtained by 
the Electronic Valet 2602. For example, when an integrated 
bio-sensor obtains certain predefined data values, an appro- 
priate emergency care provider would be automatically 
contacted. In addition, the data obtained from sensors, GPS, 
camera, microphones, and other user inputs 2600, may also 
be combined before being sent to an appropriate service 
provider For example, in the example above, GPS position 
data may be sent with the bio-sensor data to the emergency 
care provider. The emergency care provider would then 
know the patient's biological data and the location of the 
patient. Appropriate service could then be provided. 
Mobile Portal Platform 

The Mobile Portal Platform is a high -impact, server-based 
application in accordance with a preferred embodiment that 
is focused on the theme of delivering services and providing 
a personalized experience for each customer via a personal 
site located on a server. The services are intuitively orga- 
nized around satisfying customer intentions — fundamental 
life needs or objectives that require extensive planning 
decisions, and coordination across several dimensions, such 
as financial planning, healthcare, personal and professional 
development, family life, and o ther concerns. Each member 
owns and maintains his own profile, enabling him to create 
and browse content in the system targeted specifically at 
him. From the time a demand for services is entered, 
intelligent agents are utilized to conduct research, execute 
transactions and provide advice. By using advanced profil- 
ing and filtering, the intelligent agents learn about the user, 
improving the services they deliver. 

A preferred embodiment of a system utilizes a Windows 
CE PDA equipped with a GPS receiver. The embodiment is 
configured for a mall containing a plurality of stores. The 
system utilizes a GPS receiver to determine the user's 
location. One advantage of the system is that it enables the 
retrieval of data for nearby stores without relying on the 
presence of any special equipment at the mall itself. 
Although the accuracy of smaller, inexpensive receivers is 
limited to approximately 75-100 feet, this has thus far 
proven to be all that is necessary to identify accurately the 
immediately surrounding stores. The system uses generated 
data rather than actual store ads and prices. Well structured 
online catalogs are used. Other embodiments utilize agents 
that "learn to shop" at a given store using a relatively small 
amount of knowledge. Moreover, as retailers begin to use 
standard packages to create online catalogs, we can expect 
the number of differing formats to decrease, resulting in a 
tractable number of competing formats. As electronic com- 
merce progresses, it is not unreasonable to expect standards 
to evolve governing bow merchandise offerings are repre- 
sented. 

Goal Specification 

Before leaving on a shopping trip, a shopper creates a 
shopping list of items by selecting from a preexisting set of 
approximately 85 product categories (e.g. men's casual 
pants, women's formal shoes, flowers, etc,). They also 
indicate the shopping venue they intend to visit from a list 
of malls. 

Initial Store Selection 

Upon arriving at the mall, begins by suggesting the closest 
store that sells at least one item of a type entered by the user 



6,905 Bl 

52 

during goal specification. Along with the store name a 
system in accordance with a preferred embodiment prepares 
a list of the specific items available and their prices. A map 
of the mall displays both the precise location of the store and 
5 the shopper's current location. The shopper queries the 
system to suggest a store at any time based on their current 
location. 
Browsing 

To address the need of many shoppers to visit malls or 

10 shop generally without a particular destination in mind. FIG. 
27A illustrates a display in accordance with a preferred 
embodiment of the invention. The display operates in a 
browse mode for use by shoppers as they stroll through the 
mall. In browse mode the system suggests items of interest 

15 for sale in the stores currently closest to the shopper. An item 
is considered to be of interest if it matches the categories 
entered in the goals screen. If there are no items of interest, 
the general type of merchandise sold at that store is 
displayed, rather than specific items. As the shopper strolls 

20 a map displays his or her precise current location in the mall. 
If an item displayed is selected by the shopper while 
browsing, the system alerts the shopper to the local retailer 
offering the same product for the lowest price, or announces 
the best local price. This search is restricted to the local mall, 

25 as that is the assumed radius the shopper is willing to travel. 
Alternatives 

It is worth emphasizing that the current inventive agent 
will support broader aspects of the shopping task, for 
example, it could operate as bi-directional channels. That is, 

30 not only can they provide information to the shopper, but, at 
the shopper's discretion, they may provide information to 
retailers as well. In this embodiment, the system indicates a 
shopper's goals and preferences to a retailer-based agent, 
who, in turn, responds with a customized offer that bundles 

35 service along with the product. Enabling the customization 
of offers is crucial to gaining the cooperation of retailers who 
are reluctant to compete solely on price and of value to 
customers who base their purchases on criteria other than 
price. While the preferred embodiment focuses on location- 

40 based filtering primarily in the context of the shopping task, 
the current invention provides the basis for "physical task 
support" agents that provide an information channel to 
people engaged in various tasks in the physical world. 
The Predictive Value of Location 

45 The present invention is a significant advance over non 
location based agents because a users physical location is 
often very predictive of his or hers current task. If we know 
someone is at a bowling alley or a post office we can 
reasonably infer their current activity. Knowledge of a user's 

50 current task largely determines the type of information they 
are likely to find useful. People are unlikely to concern 
themselves with postal rates while bowling, or optimal 
bowling ball weight while buying stamps. In addition, 
knowledge of the resources and obstacles present at a 

55 particular location suggest the range of possible and likely 
actions of someone at that location. This awareness of a 
user's possible and likely actions can be used to further 
constrain the type of information a user is likely to find 
useful. For example, knowledge of a restaurant's wine list 

60 could be used by a recommended system to constrain the 
wine advice it presents. 

Knowledge of a shopper's precise location in a shopping 
mall is valuable because it enables the identification of the 
stores immediately surrounding the shopper. The offerings 

65 of the stores closest to the shopper represent the immediate 
choices available to the shopper. Given that shoppers place 
a premium on examining merchandise first hand and that 
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there is a cost associated with walking to other stores, the 
merchandise of the closest surrounding stores constitute the 
most likely immediate selections of the shopper. 
Consequently, among the most useful information provided 
at any given time is the availability of merchandise in the 
surrounding stores that matches their previously stated 
goals. 

People tend to move to different locations while perform- 
ing many of their our tasks. This suggests that their imme- 
diate surroundings do not completely capture the full range 
of options they may have. In fact one of the main reasons for 
leaving a location is to perform an action that is not possible 
at the current location. 

Nevertheless, one does tend to address most tasks within 
relatively local areas. Thus while their immediate surround- 
ings suggest the options they have available at a given point 
in time, a broader view of a location will often capture the 
options they are likely to consider over the course of a task. 
In the case of mall shopping, for example, the stores 
immediately surrounding the shopper represent the options 
available at that moment. Mall shoppers, however, are 
generally willing to travel to any store within the mall. 
Therefore the potential options over the entire shopping trip 
include all the stores in the mall. Accordingly, information 
is presented on offerings of interest only from the immedi- 
ately surrounding stores because these are the immediately 
available options. When asked for alternatives, the system 
restricts itself to all the stores within the mall — the area 
within which the shopping task as a whole is likely to be 
performed. Being alerted that a store hundreds or thousands 
of miles away sells the same merchandise for a few dollars 
less than the cheapest local alternative is of little value in 
cases when shoppers require a first hand examination of the 
merchandise in question or are not willing to wait for 
shipping. 

Physical vs. Online Shopping 

In addition to the significant advantages over non-location 
based agents the present invention over comes disadvan- 
tages o online (or web) shopping. It is tempting to argue that 
online shopping will soon become the predominant mode of 
shopping, pending only greater penetration of home 
computers, the expansion of online offerings, and better 
online shopping tools. At first glance it would therefore 
appear to be a mistake to begin using location to support an 
activity that will become virtualized. Already we've seen the 
emergence of a number of software agents that support 
online shopping. For example, programs that allow users to 
identify the cheapest source for a music CD, given a title. 
Similar programs have been developed for buying books, 
such as Bargain Bot These systems demonstrate the poten- 
tial of electronic commerce web agents to create perfect 
markets for certain products. The success of these agents 
will encourage the development of similar web shopping 
agents for a greater variety of goods. 
The Limitations of Online Shopping 

Certainly online shopping will continue to grow and the 
trend towards more powerful online shopping agents will 
continue. Nevertheless, it also seems clear that no matter 
how sophisticated web-agents become, traditional physical 
shopping will continue to dominate the market for the 
foreseeable future. Several inherent difficulties of online 
shopping will ensure the continued reliance on physical 
shopping: 

Non-fungible goods 

Web-based shopping agents have typically enabled users 
to identify the cheapest price for fungible products such as 
books and music CDs. While this capacity to create "perfect 
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markets" for such commodities is of great benefit to 
consumers, several difficulties exist that will complicate 
applying these approaches to arbitrary products, 
Commodities are particularly well suited to shopping 

5 agents because it is easy to make comparisons between 
competing offers. Because commodities are fungible, one of 
the very few dimensions upon which they differ is price. 
Price therefore becomes the primary, if not sole, criterion 
upon which purchasing decisions are made. 

10 As soon as we move beyond commodities, however, 
several other criteria become important. For example, how 
do we compare items such as sweaters, mattresses, or tables? 
In addition to price we care about the materials used, the 
color, how it fits and feels, and the workmanship. Similar 

15 problems apply to most other products. 
Imprecise goal specification 

A second, related difficulty lies in communicating our 
desires to an agent. Shopping agents are great if the user 
knows the precise commodity he or she wants. Then they 

20 can simply enter the product by name. Unfortunately, if they 
don't have a specific item in mind when they shop, then the 
problem of conveying what is wanted to an agent becomes 
more difficult. For example, how does the user tell an agent 
what kind of lamp they want for their living room? 

25 Undeveloped Preferences 

Interfaces that allow shoppers to include descriptive fea- 
tures like price ranges, color, options, brands, etc, can help 
address the above problem, but they are not enough. Much 
of the time shoppers either haven't formed preferences or 

30 can't articulate their desires until after they've started shop- 
ping and had a chance to examine various examples of the 
target products. 
Shopping is Entertainment 

People like to shop and do so without having a specific 

35 purchase in mind. One study found that 42% of consumers 
are "non-destination shoppers" that visit the mall primarily 
for leisure browsing and socializing. 
Shopping is Sensory 

Even if the user could effectively provide these details 

40 most would be unlikely to delegate a purchasing decision to 
such an agent. After all, many people are uncomfortable 
even trusting spouses to make appropriate purchases on their 
behalf. Most people want to see and touch first hand what 
they're considering before making a purchase decision. The 

45 few preferences they may provide an agent cannot replace 
this rich, first-hand experience. At best such preferences 
could be used to generate a candidate set for shoppers to 
consider. 

Instant Gratification 

50 Shopping is often a very emotional activity. People are 
pleased with their purchases and often can't wait to get home 
to try them out. The inherent delay between online purchases 
and their receipt is a significant issue to those who simply 
must take home their selections as soon as they see them. 

55 In the end, consumers will continue to engage in physical 
shopping because of the limitations listed above. However, 
the fact that the task can't completely be delegated to 
software agents does not rule out a role for them. First, users 
find them useful for purchasing commodities when they 

60 know what they want. A second role, however, is to support 
the physical shopping task itself, throughout the time that a 
person is engaged in it. This, of course, is the approach taken 
in the Shoppers Eye project 
Shopper's Eye 

65 At first blush it may seem that the current invention is 
subject to some of the same limitations as purely web-based 
agents. After all, why should it be any easier to communicate 
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your goals to a PDA than it is to a web-based agent? Why Negotiation of Offers 

would your preferences be any more developed for pur- The present shopping agent is not restricted to providing 

chases supported by a PDA system than a web-based agent? the shopper with information. It is possible to negotiate 

A key difference between purely web-based agents and prices and service options with retailers, 

the current "physical task support agents" (i.e. an agent that 5 Product Selection, Purchase and Product support 

supports a user engaged in a task in a physical setting) is that The present invention facilitates the transaction itself and 

web-based agents are completely responsible for conveying can be used as a channel through which product service can 

all information that will be considered by the user. On the be delivered. 

other hand, "physical task support" agents in accordance FIG. 27B is an illustration of the Mobile Portal platform 
with a preferred embodiment can augment the approaches of 10 2710 including a Mobile Portal 2712 and an Electronic Valet 
web-based agents by referring to aspects of a user's envi- 2713. The Electronic Valet 2713 includes a supporting 
ronment. For example, it is not terribly important to convey hardware device 2716, such as a wireless PDA, and a Mobile 
richly the feeling of a particular sweater if the sweater is in Portal Thin Client standard 2714 executing on top of a Thin 
a store thirty feet away. It need only refer the shopper to the Client Operating System 2718. The Mobile Portal consist of 
sweater. The shopper will gain a much better appreciation of is an encryption and decryption element 2720, a Mobile Portal 
the sweater by trying it on than through anything that can be Server 2722, intelligent agents 2724, a Customer intelli- 
conveyed by the system. When too many products match an gence element 2726, and a Customer database 2728. 
imprecisely specified goal for a web-based agent, a more Thin Client is a generic term used to describe a group of 
restrictive search must be made. However, many matches rapidly emerging technologies that provide a reduction in 
simply indicates there is a store that is likely to be of great 20 total cost of ownership through a combination of reduced 
interest to the shopper and therefore should be visited. Once hardware costs, reduced maintenance and support costs, 
inside, narrowing down the merchandise of interest in per- reduced LAN/WAN bandwidth requirements, reduced down 
son will often be far easier than refining the goals on a time, improved performance and enhanced security. The 
web-based agent. Therefore physical task support agents can term "Thin" in Thin Client refers to the (very small) size of 
assist users to elaborate their preferences and identify spe- 25 the client operating system. In contrast, traditional PC 
cific goals by calling users' attention to aspects of their operating systems (DOS, Windows 95, etc.) are considered 
physical environment as a means of conveying information "Fat" Clients due to their large size and resource require- 
throughout the entire course of the task. ments. Despite the fact that the Thin Client operating 
The Promise of Physical Shopping Agents systems are thin, the capabilities of Thin Clients are robust. 

It is hardly surprising that physical shopping has been 30 Client solutions are deployed today in mission critical 
neglected by the agents community. After all, until very environments and they are providing reliable and responsive 
recently there simply was no reliable way to deliver cus- access to a myriad of applications. The Mobile Portal Thin 
tomized information to individual shoppers in remote loca- Client 2714 is a Thin Client wherein the majority of the 
lions. However, the explosive growth of PDAs, and their processing is done on the Mobile Portal Server 2722 and 
increasingly sophisticated communications capabilities 35 related third party content and service providers 2730. The 
promise to make them effective channels of "just in time" user utilizes the Mobile Portal Thin Client application 2714 
information to users wherever they happen to be. The to select services and review information provided by the 
present invention provides an ituitive, novel agent that Mobile Portal Platform 2710. The Mobile Portal Thin Client 
supports physical shopping by exploiting the promise of this application 2714 is made more device independent by the 
developing channel that support all phases of the shopping 40 use of a Thin Client Operating System 2718. The Thin Client 
task and solves the foregoing problems including: Operating System 2718 acts as a messenger between the 
Specification of Goals Mobile Portal Thin Client application 2714 and the support- 
Shoppers begin by indicating at least the general category m S hardware 2 ™ The Thin Client Operating System 2718 
of merchandise they are interested in. Shopping agents need aUows the Moblle Portal ^ 2714 to make faction 
to enable the specification of goals at various degrees of 4S calls to me Thin CKent Operating System 2718 for low level 
specificity. With the present invention these goals may be hardware operations, such as display calls and user input 
refined as the task progresses queries. A separate Thin Client Operating system 2718 can 
Exploration of Product Space be *vetopod for each hardware device 2716 used as the 
i, f , r . t t . . . supporting hardware for the Electronic Valet 2713. This 
Before shoppers can make a selection they need to 5Q a n ows the Mobile Portal Thin Client application 2714 to run 

^.^ll^rl^ ? ^ .* ? T% a ^ DtS on **™t supporting hardware 271 i without the need for 

can aidm .this task by presentmgvanous classes of offermgs ^ mficam * modification 

reviews, demonstrations, etc The present inventive Physical ^ MobUe porla , 271 f receives data from lhe Electronic 

shoppmgagentcanaugmentthisbyprovimngshopperswjth ^et 2jn ^ a packet . switched wifckss network 2732 . 

a tour of the locally available offermgs. 55 Informalion rece / ed through me packet . switcbed wireless 

Refinement of Preferences network is then decoded by the encryption and decryption 

As shoppers learn what is available and examine the clement 2720 of the Mobile Portal 2712. Once the data has 

offerings their preferences evolve. Agents need to enable been decoded the Mobile Portal server 2722 utilizes intel- 

shoppers to refine their preferences over time. The present ii gent agents 2 724, customer intelligence 2726, and cus- 

invention allows the user to refine their preferences. 60 tomer data 2728 10 obtain ^ requested data from third party 

Identification and Comparison of Candidate Products content and service providers 2730. The Mobile Portal 

As shoppers begin to understand what they want and what Server 2722 utilizes intelligent software agents to respond to 

is available they typically compile a list of candidates that customer needs. The software agents 2722 utilize customer 

will be considered more carefully. The present inventive data 2728 to determine to personalize their task to the 

agents supports the construction and maintenance of such 65 individual user's goals, habits and preferences. The cus- 

lists and facilitates the comparison of candidates within the tomer data 2728 is in turn routinely updated by the customer 

list according to various criteria. and by the customer's actions. Each time a user uses the 
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Mobile Portal 2712 a log is kept of the user's queries and 
other uses of the Mobile Portal Platform 2710. In this way, 
the software agents 2724 are able to utilize the user's past 
habits to personalize their task. 

In addition to software agents 2724, the Mobile Portal 
Server 2722 utilizes customer intelligence 2726 to respond 
to user needs. The user may utilize data-mining and pattern 
recognition to find the information he desires. Again, the 
customer data 2728 is updated to reflect the users data- 
mining and pattern recognition uses. Third party content and 
service providers 2730 are utilized by the Mobile Portal 
2712 to provide the services and information requested by 
the users. The third party content and service providers may 
be accessed through the Internet or through a Mobile Portal 
Extranet. Hie intelligent agent software 2712 search through 
the third party providers to determine the one most suitable 
for the user, taking into consideration the customer's profile 
contained in the customer data 2728. In this way, the user 
may be less specific in their queries than they would have to 
be without a user profile. For example, a user can request a 
jacket utilizing the Mobile Portal Platform 2710. The intel- 
ligent agents would then utilize the customer data 2728 to 
determine more specifically what the customer actually 
desired. In this case, the customer data 2728 may informa- 
tion that this particular user likes denim jackets as opposed 25 
to leather jackets. The intelligent agents 2724 would then 
search for denim jackets. Of course the user profile could be 
overridden by the user in order to obtain information that is 
contrary to what is stored in the user's profile. Some typical 
services provided include geographic location information, 30 
audio and visual editing, personal news & entertainment, 
personal shopping, personal health & safety, personal 
organizer, personal finance, and personal communication. 

Geographic location services are typically based on infor- 
mation received from the integrated Global Positioning 35 
System. GPS data is combined with specific user request 
data to provide location specific information to the user. For 
example, the user may be located in San Francisco and wish 
obtain information on fine dining in the city. The user would 
request fine dining information utilizing the Electronic Valet 40 
2713. Location data obtained from the integrated GPS 
receiver would be automatically combined with the user 
request for fine dining, and the combined message would 
then be transmitted to the Mobile Portal 2712. Based on the 
data received, the Mobile Portal would select the appropriate 45 
service and transmit the request, in this case fine dining in 
San Francisco. The Mobile Portal would then transmit the 
response received back to the Electronic Valet 2713. The 
user is then presented with the requested information, for- 
matted and displayed on the display device of the Electronic 50 
Valet 2713. 

Audio and visual editing services are typically based on 
the data received from the integrated camera and micro- 
phone. The user typically captures images utilizing the 
integrated digital camera. However, the user may also obtain 55 
digital images from other sources, such as scanners, e-mail, 
and web pages. In addition, the user typically captures sound 
files utilizing the integrated microphone. However, audio 
files may also be obtained from other sources, such as 
e-mail, web pages, and CDs. The image and/or audio data is 60 
combined with specific user request data to provide image 
and audio editing capabilities to the user. For example, the 
user may capture an image with the integrated digital 
camera, and then request to edit the image using a specific 



Mobile Portal 2712. Based on the data received, the Mobile 
Portal 2712 selects the appropriate service and transmits the 
request, in this case image editing. The Mobile Portal then 
transmits the response received back to the Electronic Valet 
2713. The user is then presented with the requested 
information, formatted and displayed on the display device 
of the Electronic Valet 2713. In this case, the user would 
receive a user interface for image editing. The user would 
then use the image editing user interface to edit the image. 
Changes to the image are treated as request which the 
Mobile Portal 2712 passes on to the image editing 
application, running locally or on a separate server 

Bio-Medical Sensor Integration in Accordance with 
a Preferred Embodiment 

One embodiment of the present invention is an Electronic 
Valet including integrated bio-sensors, such as pressure 
transducers, respiratory sensors, \blumetric Sensors, and 
Defibrillators. 

Integrated Pressure Transducers to measure blood pres- 
sure can be of two types, invasive and noninvasive. Invasive 
integrated pressure transducers require the user to imbed 
part of the unit into the blood stream, while noninvasive 
integrated pressure transducers do not need access to the 
blood stream. Pressure transducers measure the blood pres- 
sure of the patient and report it to a receiving unit, in this 
case the Electronic Valet. The Electronic Valet is then able 
to analyze and rout the data received from the pressure 
transducer utilizing the Mobile Portal Thin Client and Mor- 
tal Portal Server. 

Respiratory sensors, such as strain gages and volumetric 
sensors may also be integrated into the Electronic Valet. 
Strain gages worn around the chest region change imped- 
ance as the gage expands and contracts according to the 
expansion and contraction of the chest during breathing. 
"Volumetric sensors sense the amount of air pressure passing 
through the sensor, such as when a patient breathes into the 
volumetric sensor. Both strain gages and volumetric sensors 
are able to wirelessly transmit their corresponding data to 
the Electronic Valet unit, thus giving the user greater free- 
dom in their activities. As with data received from pressure 
transducers, data received from strain gages and volumetric 
sensors may be analyzed and routed utilizing the Mobile 
Portal Platform. 

Defibrillators integrated into the Electronic Valet may be 
utilized to sense heart functions. Defibrillators attach to the 
patient utilizing a saline based gel and track heartbeats 
through R, T, and P waves. As with strain gages and 
volumetric sensors, defibrillators can wirelessly transmit 
data to the Electronic Valet, which then analyzes and routs 
the data utilizing the Mobile Portal Platform. 

The above mentioned bio-sensors can be integrated indi- 
vidually or in combination with other sensors, such as 
environ-sensors, other bio-sensors, or a GPS receiver, 
depending on the need of the particular user. For example, 
an elderly user with a history of heart problems could have 
an Electronic Valet including an integrated Defibrillator and 
GPS receiver. Utilizing the Mobile Portal Platform, the user 
could stay up-to-date on news and information about his 
condition, including various food and drugs that could be 
harmful. 

In addition, the Electronic Valet is capable of sensing 
problems that may occur because of the heart condition, 



photo editor. The image captured by the integrated digital 65 regardless of the location of the user. While walking in the 



camera is then combined with the user's request for photo 
editing, and the combined message is then transmitted to the 



park, the user may feel chest pains, the Electronic Valet 
would sense that the pains are being caused by difficulties 
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arising because of the user's heart condition. This is accom- rescue the user. After treatment at the hospital, the Mobile 

plished utilizing the integrated defibrillator and the Elec- Portal Platform is able to coordinate the users after-care 

tronic Valet's analysis capabilities. In this case, the data program, including tracking his diet and nutrition, as well as 

received from the integrated defibrillator will exceed pre- his exercise routine and medication, 

determined safety thresholds, thus alerting the Electronic 5 

Valet that an emergency has occurred. Utilizing the Mobile Supporting Code in Accordance with a Preferred 

Portal Platform, the Electronic Valet would then notify the Embodiment 

appropriate emergency response unit, forward that heart data The following code is written and executed in the 

to the users physician, and notify the user's family. Microsoft Active Server Pages environment in accordance 

In addition, the Electronic Valet forwards location 10 with a preferred embodiment. It consists primarily of 

coordinates, received from the integrated GPS receiver, to Microsoft Jscript with some database calls embedded in the 

the emergency response unit allowing them to locate and code to query and store information in the database. 



Intention-Centric Interface 
Create an Intention ASP Page ("intention_create.asp") 



<%@ LANGUAGE = "JScript" %> 
Response. Buffer = true; 
Response. Exp ires - 0; 

%> 

<html> 
<head> 

<title>Creatc An Intention </titlc> 
</head> 

<body bgcolor - "#FFE9D5" style = "font-family: AriaT text -"#000000"> 
<% 

//Define some variables 

upl = Server.CreateObject( H SoftArtisans.FileUp") 

intention_namc - up 1 . Form (" in ten tion name") 

intenlion_desc = upl.Form("mtenUon_desc") 
//intention_name = RequesLForm("intention_name") 
//intention_desc - Request. Form (" in te nt ion_de sc") 
//intentian_icon = Request. FormC*intention_icon") 
submitted = upl.Fonn(**submitted") 
items - new Enumerator(upl. Form) 
%> 
<% 

//Establish connection to the database 

objConnection = Server.CreatcObjcct ("ADODB.ConncctiDn") 

objConnection. Open(" Maelstrom^ 

%> 

<% 

//Check to see if the person hit the button end do the appropriate thing 
if (submitted — "Add/Delete") 

flag = "false" 

//loop through all the inputs 
while) !itcms.atEnd( )) 
{ 

i - items.item( ) 

//if items are checked then delete them 
if(upl.Form(i) =» "on") 

{ 

objConnection.Execute("delete from user_intention where 
intention_id «" + i); 

objConnection. Execute("delete from intentions where 
intention_Jd =" + i); 

objConnection.Execute("delete from tools_to_intention where 
intention_id =" + i) 

flag - "true" 

} 

items .moveNext ( ) 

} 

// if items were not deleted then insert whatever is in the text field in 
database 

if(flag — "felse") 
{ 

intention_name_short <=■ intention_name.replace(/ /gi,"") 

objCbrmectioa.Execute("INSERT INTO intentions 
(intention_name;intention_desc,intention_icon) valucsC" + intention_name + "* 
intention_desc +"'," + intention _name_short + "\gif ' + *")") 

Response, write ("the intentions short name is " + intention name_Bhort); 

upl.SaveAs("E:dcvelopment/asp__examplesr-*- intention_name_short 

+" .gif ' 
} 

} 
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Intention-Centric Interface 
Create an Intention ASP Page ("intention_create.asp") 

// Query the database to show the most recent items. 

rs Customers List » objConnection.Execute ("SELECT * FROM intentions") 

%> 

<input type = "Submit" namc="retum_ta_mcp" value="Go to Main Control Panel" 
onclick="location. href='defaul Lasp"'> 

<form method-"post w action-"intention create.asp" enctype—"multipart/form-data" > 

<TABLE border=0> 

<tr><td colspan=*'2"><font face="Ariar' size=**+l"xb>Enter in a new 
intention </b> Vfont></td></tr> 

<tr><td><font facc="Arial">Namc:</font></td><td><INPUT TYPE="text" 
nameo"intention name"x/td></tr> 

<trxtd><font face-"Arial'VDescription:</font></td><td><TEXTAREA 
namc= M intcntion_dcsc''></TEXTAREA><ytd></tr> 

<tr><tdxfont face="AriaI">icon Image: </font><Ad><td><INPUTTYPE="fiIe" 

NAME=»"inlention_icon" 6ize-40x/tdx/tr> 

<tr><td colspan="2 r, xINPUT type=" submit" name»"submitted" 

value»"Add/Delete"><ytdx/tr> 

</TABLE> 

<HR> 

<font face»"Arial"size=« w +l"><b>Cun , ent Intentions </bx/font> 
<TABLE> 

<tr bgcolor-E69780 align-"center'*> 

<td> 

<FONT color= u white">DeIete</FONT> 

</td> 

<TD> 

<FONT color="white">Intention</FONT> 
</TD> 
<TD> 

<FONT colon» w white">Description</FONT> 
</TD> 
</tr> 

<% 

//Loop over the intentions in the list 
counter = O, 

while (!rs Customers List. EOF) 
{ 

%> 

<ti bgcolor~ M white" style=**font-size:smaller"> 
<td align-center> 

<INPUT type^checkbox" 
name=" <%rsCus tome rList("intention__id* ")%>"> 
</TD> 
<td> 

<%>=> rsCustomersList("intention_name")%> 
</td> 
<td> 

<%= rsCustomersList(**intention_de5c'*)%> 
<Jtd> 
<td> 

<img src=". ./images/<%= rsCustomersList("'intention_Jcon")%>"> 
<ytd> 

<% 

counter-M- 

rsCustomerslist.MoveNcxt( )} 
%> 

</TABLE> 
<hr> 

Available Tools 
<tform> 
</BODY> 
</HTML> 



Retrieve Intentions List ASP Page (**intentions_list.asp") 



<!- ^include rlle-"include/checl^^authentication.inc' , --> 

<HTSL> 

<HEAD> 

<TTrLE>mySit! Intentions List</TTTLE> 
<SCRIPT LANGU AGE="JavaScrip t"> 
function intentionsList ( ) { 



61 

-continued 
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-continued 



Retrieve Intentions List ASP Page ("intentions__Iist.asp") 

this.intemalArray - new Array( ); 
<% 

// establish connection to the database 

objConnection - Serve r.OeateObject ("ADDDB.Connection"); 
objCo nnection .Open( M Maelstrom") ; 
// create query 

intentionsQuery - obj Conn ect ion. Execu te(" S ELECT * FROM intentions 
ORDER BY inlcntion_name asc"); 
%> 

// write out the options 
numOptions = 0 

<% 

while (! intentionsQuery. EOF} 

intentionNamc = intentionsQuery ( M in tentioIl_naI^e ,, ); 
intenlionlcon " intentionsQueryCintention_icon"); 

%> 

this.internalArray[<%= numOptions%>] « new Array(2); 
thisinternalArrayl <%= nwnOptions%>] [0] = intentionName 

%>"; 

this.intcrnalArray<%- numOptions%>] [i] - "images/<%~ 
intentionlcon %>"; 

<% numOptions++; intentionsQuery.moveNext( ); %> 

<% } %> 

} 

numlntentions *» <%= numOptions%>; 
intcntionArray - new intenlionsList( ). internal Array; 
function selectlntention ( ) { 

for (i=^>,I<num[ntentions;i++) { 

if (IntentionsListSelectoptions[i].selected) { 

intentionNameTestField.value » intentionArray[i] [0]; 
//intentionPicturcsrc » intentionArray[i[l£ 
break; 

} 

} 

} 

</SCRIPT> 
</HEAD> 

<BODY BGCOLOR-"<%-Session{"main_background"}%> w style-"font-f»mily: AriaT> 
<CENTER> 

<! — <FORM NAME»"intention_list"> — > 

<TABLE FRAME-** BOX" border-0 CELLPADDING-"2" CELLSPACING-"2"> 

<TR><TD COLSPAN="3" STYLE="font: 20pt arial" ALIGN="CENTER"><B >Add a mySite! 

Intention<fB ></TD ></TR> 

<TR><TD COLSPAN-**3"> </rD>VTR> 

<TR> 

<TD width-" 100"><font size- M -l"Please Select An Intention You Would Like to 
Add to Your List</font></TX>> 
<TD colspan-2> 

<S ELECT ID="IntcntionsListSclcct" NAME=*IntcntionsListSclect" 
SIZE="10" style-"font: 9pt Arial;" onCHck= M S€lectIntention( )"> 
<% 

intentionsQuery.moveFiret( ); 
for (j=0;j<numOptionsJ-H-) { %> 

<OPT10N VALUE="<%= intentionsQuery ("in tention_id} %>" <% if 
(J „ 0) { %> SELECTED <% } %» 

intentionsQuery("intention_name") %> 
<% intentionsQuery. moveNext( ) 

} 

intentioQsQuery.moveFirst( ); 

%> 

<v f SELECT> 
<JTD> 
<JTR> 

<TRxTD COLSPAN='V> </TD><TR> 
<TR> 

<TD width="100"><font sizc="-l">Customizc the Intention namc</font></TD> 
<TD COLSPAN=2"><INPUTTYPE=»"test ,f NAME= u intentionNameTextField" 

ID="intentionNameTextField" SIZ£="30" VALUE» W <% intentionsQuery("intcntion_namc") 

%>"<>/TD> 

<.TR> 

<TRxTD COLSPAN="3" >*nbsp;<^TD></TR> 
<TR> 

<TD COLSPAN="3" ALIGN="CENTER"> 

<INPUT TYPE-"button" NAME-"intentionCancelButton M VALUE-"Cancer 
SIZE="10" OV'intentionOKButton" 
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Retrieve Intentions List ASP Page ( M intentions_list.asp") 



onClick-"javaScript: top. opener, top. navframe.addAdIntention( );"> 

     &nbsp ; &nbsp ; & obsp;     &nbsp ;    &Dbsp;    
  

<INPUT TYPE-"button" NAME-"intentionCancelButton" VALUE- w Cancel M 
SIZE="10" U>>"intentionCancelButtori" onClick="self. close ( );"> 

</TD> 
>/TR> 
</TABLE> 
<!— <fFORM> 000> 
</CENTER> 

<% objConnection.Close( ); %> 

</BODY> 

</HTML> 



Display User Intention List ASP Page (excerpted from "navigation.asp") 

<DIV ID-"intentionsList" style-"poaition: absolute; width:210; height:95; left: 36Spt; 
top: -5; visibility: hidden; font-family: Arial; font-color: #000000; font: 8pt 
Arial ; " > 

<DIV style-"position: absolute; top: 7; left:7; height:78; width :210; z-index:2; 
background: <%»Scssion( M main_background'')%>; border: solid Ipt #000000; padding: 3pt; 
overflow: auto; aliak: black; link: black;"> 
<body LINK-"#000000" ALINK- "#0000 Off' vlink-"black"> 
<% 

If create query 

intentionsQuery - objConnection.Execute) "SELECT 
uscr_intcntion.* FROM user_intention, uscr_intcntion_to_pcrsona WHERE 
user_Jntention_to_persona.user_persona_id <* ** + Session("currentUserPersona") + "AND 
user__iiitention_to_pensona.user_intention id — user_intention.user_intention_id" ); 

numin tendons = 0; 

Response. Write ("<SCRIPT>numintentions=" + 
intentionsOuery.RecordCount' + " </SCRIPTxTABLE cellpadding-'O' width-* 100%' 
cellspacing=' 0 '>"); 

while (! intentionsQuery. EOF) 

{ 

%> 

<TR><TD><a href= u javascript:changeIntention(' <%= 
intentionsQuery (**user_intention_id") %>*, '<%=numintentions%>')'* 
onmouseover="mouseOver'Iab( )'*onmouseout=**mouseOutOfIhb ( )"><font color=**Black" 
face*»37 arial" size~"-2"><%=intentions0^ery)"mtent^ 

%><yfont></a></TD><TD><IMG align="right" SRC= - images/de!ete.gif ' alt= M Delete this 
intention" onClick="OTnfirmDelete)<%-intenUonsC^eiy('^iser_intention_id r *) 
%>)"></TD></TR> 

<%numintentions++; intentionsQuery. moveNext( ); 
%> 

<% } 
Response. Write ( M <SC^[PT>numintentions="+numiQteniions +"</SCRIPT>"); 
%> 

<tr><td colspan="2 n ><hr>-?/td></tr> 

<TRxtd cokpan="2"><a href« K javascript:changeIntention('add 
. . . %<%-numintentions%>);" onmouseover-"mouseOverTab( )" 
onmou5eout="mouseOutOfIhb( )"><£ont color="Black" face="ariaP sizc="-2">add 
. . . </font></a></td></TR> 
<tftable> 

<7body> 
</DIV> 

<DrV style-"position: absolute; top:0; left:-5; width: 230; height: 105; z-index:l; 

" onmouseouto M intentionli5t^tyle. visibility"' hidden"* 

onmouseout="intentionlisLstyle.visibiiity= * hidden'" 

onmouseover-"intentionlist.style.visibtlity- 'hidden'"></DIV> 

</DIV> 

</DIV> 



While various embodiments have been described above, 
it should be understood that they have been presented by 
way of example only, and not limitation. Thus, the breadth 
and scope of a preferred embodiment should not be limited 
by any of the above described exemplary embodiments, but es 
should be defined only in accordance with the following 
claims and their equivalents. 



What is claimed is: 

1. A method for obtaining personal financial information 
on a mobile computing environment utilizing an interface 
support framework, comprising the acts of: 

a) creating a query based in part on user input on a thin 
client computer;- 
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b) querying a network of information utilizing the inter- 
face support framework; 

c) receiving a response to the query from the network of 
information through the interface support framework; 

d) processing information in the response utilizing an 5 
application tool on the thin client computer, wherein 
the information in the response is filtered by the appli- 
cation tool based on one or more personal financial 
pattern templates containing information supplied by 
the user, and wherein information in each of the one or 10 
more personal financial templates represents a persona 

of the user; and 

e) displaying any information from the response which 
has been selected based on the one or more personal 5 
financial pattern templates to the user. 

2. A method for obtaining information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 1, wherein a personal financial 
pattern template includes a plurality of patterns of words. 

3. A method for obtaining information on a mobile 20 
computing environment utilizing an interface support frame- 
work as recited in claim 1, wherein the interface support 
framework includes a mobile portal server. 

4. A method for obtaining information on a mobile 
computing environment utilizing an interface support frame- 25 
work as recited in claim 1, wherein the interface support 
framework includes an intelligent agent processor. 

5. A method for obtaining information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 1, wherein the interface support 30 
framework includes a customer intelligence framework that 
queries customer data. 

6. A method for obtaining information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 1, wherein the interface support 35 
framework includes a security framework for encrypting and 
decrypting information. 

7. A method for obtaining information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 1, wherein the interface support 
framework includes an interface for obtaining third party 40 
content. 

8. A method for obtaining information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 1, wherein the interface support 
framework includes support for mobile clients. 45 

9. An apparatus that obtains personal financial informa- 
tion on a mobile computing environment utilizing an inter- 
face support framework, comprising: 

a) a processor; 

b) a memory that stores information under the control of 50 
the processor; 

c) logic that creates a query based in part on user input on 
a thin client computer; 

d) logic that queries a network of information utilizing the 55 
interface support framework; 

e) logic that receives a response to the query from the 
network of information through the interface support 
framework; 

f) logic that processes the information in the response 60 
utilizing an application tool on the thin client computer, 
wherein the information in the response is filtered by 
the application tool based on one or more personal 
financial pattern templates containing information sup- 
plied by the user, and wherein information in each of 65 
the one or more personal financial templates represents 

a persona of the user; and 
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g) logic that displays any information from the response 
which has been selected based on the one or more 
personal financial pattern templates to the user. 

10. A computer program embodied on a computer- 
readable medium that obtains personal financial information 
on a mobile computing environment utilizing an interface 
support framework, comprising: 

a) a code segment that creates a query based in part on 
user input on a thin client computer, 

b) a code segment that queries a network of information 
utilizing the interface support framework; 

c) a code segment that receives a response to the query 
from the network of information through the interface 
support framework; 

d) a code segment processes the information in the 
response utilizing an application tool on the thin client 
computer, wherein the information in the response is 
filtered by the application tool based on one or more 
personal financial pattern templates containing infor- 
mation supplied by the user, and wherein information 
in each of the one or more personal financial templates 
represents a persona of the user; and 

e) a code segment that displays any information from the 
response which has been selected based on the one or 
more personal financial pattern templates to the user. 

11. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein a personal finance 
pattern template includes a plurality of patterns of words. 

12. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein the interface support 
framework includes a mobile portal server. 

13. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein the interface support 
framework includes an intelligent agent processor. 

14. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein the interface support 
framework includes a customer intelligence framework that 
queries customer data. 

15. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein the interface support 
framework includes a security framework for encrypting and 
decrypting information. 

16. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein the interface support 
framework includes an interface for obtaining third party 
content. 

17. A computer program embodied on a computer- 
readable medium that obtains information on a mobile 
computing environment utilizing an interface support frame- 
work as recited in claim 10, wherein the interface support 
framework includes support for mobile clients. 

18. A method as recited in claim 1, wherein a personal 
financial pattern template is adapted for identifying words 
separated by punctuation, identifying full names by finding 
two capitalized words, parsing out time strings, and identi- 
fying continuous phrases of capitalized words as at least one 
of a company, a topic, and a location. 

* * * * * 
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600 

<HTML> jy^/ 
<HEAD> / 

<TITLE>Company Name's Home Page</TITLE> 



602 

Company Name / 
Street Address / 
City, State Zip 
Phone: ###-###-### 
Fax: ###-###-### 

604 

Copyright (C) 1996 Company Name. All rights reserved. 



Fig.6 
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SYSTEM AND METHOD FOR 
GEOGRAPHICALLY ORGANIZING AND 
CLASSIFYING BUSINESSES ON THE 
WORLD-WIDE WEB 

This application claims priority under 35 U.S.C. Section 
119 based on U.S. application Ser. No. 60/017,548, filed 
May 10, 1996. 

BACKGROUND OF THE INVENTION 

The present invention generally relates to a resource 
discovery system and method for facilitating local com- 
merce on the World-Wide Web and for reducing search time 
by accurately isolating information for end-users. For 
example, distinguishing and classifying business pages on 
the Web by business categories using the Standard Industrial 
Classification (SIC) codes is achieved through an automatic 
iterative process which effectively localizes the Web. 

DESCRIPTION OF THE RELATED ART 

Resource discovery systems have been widely studied and 
deployed to collect and index textual content contained on 
the World-Wide Web. However, as the volume of accessible 
information continues to grow, it becomes increasingly 
difficult to index and locate relevant information. Moreover, 
global flat file indexes become less useful as the information 
space grows causing user queries to match too much infor- 
mation. 

Leading organizations are attempting to classify and 
organize all of Web space in some manner. The most notable 
example is Yahoo, Inc. which manually categorizes Web 
sites under fourteen broad headings and 20,000 different 
sub-headings. Still others are using advanced information 
retrieval and mathematical techniques to automatically bring 
order out of chaos on the Web. 

Solutions to solve this information overload problem have 
been addressed by C. Mic Bowman et al. using Harvest: A 
Scalable, Customizable Resource Discovery and Access 
System. Harvest supports resource discovery through topic- 
specific content indexing made possible by a very efficient 
distributed information gathering architecture. However, 
these topic specific brokers require manual construction and 
they are geared more for academic and scientific research 
than commercial applications. 

Cornell's SMART engine developed by Gerard Salton 
uses a thesaurus to automatically expand a user's search and 
capture more documents. Individual, Inc. uses this system to 
sift through vast amounts of textual data from news sources 
by filtering, capturing, and ranking articles and documents 
based on news industry classification. 

The latest attempts for automated topic-specific indexing 
include the Excite, Inc. search engine which uses statistical 
techniques to build a self-organizing classification scheme. 
Excite Inc/s implementation is based on a modification of 
the popular inverted word indexing technique which takes 
into account concepts (i.e., synonymy and homonymy) and 
analyzes words that frequently occur together. Oracle has 
developed a system called ConText to automatically classify 
documents under a nine-level hierarchy that identifies a 
quarter-million different concepts by understanding the writ- 
ten English language. ConText analyzes a document and 
then decides which of the concepts best describe the docu- 
ment's topic. 

The systems described above all attempt to organize the 
vast amounts of data residing on the Web. However, these 
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mathematical information retrieval techniques for classify- 
ing documents only work when the message of a document 
is directly correlated to the words it contains. Attempts to 
isolate documents by regions or to separate business content 
from personal content in an automated fashion is not 
addressed by any conventional system or structure. 

SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide 
a method and system for overcoming the above-mentioned 
problems of the conventional methods and techniques. 

The invention is based on a heuristic algorithm which 
exploits common Web page design principles. The key 
challenge is to ascertain the owner of a Web page through an 
iterative process. Knowing the owner of a Web page helps 
identify the nature of the content business or personal which, 
in turn, helps identify the geographic location. 

In a first aspect of the invention, a method of classifying 
a source publishing a document on a portion of a network, 
includes steps of electronically receiving a document, based 
on the document, determining a source which published the 
document, and assigning a code to the document based on 
whether data associated with the document published by the 
source matches with data contained in a database. 

In a second aspect, a search engine is provided for use on 
a network for distinguishing between business web pages 
and personal web pages. The search engine includes a 
mechanism for parsing the content of a hyper-text markup 
language (HTML) at a web address and searching for 
criteria contained therein, a mechanism for analyzing a 
uniform resources locator (URL) of the web address to 
determine characteristics thereof of a web page at the web 
address, a mechanism for determining whether the criteria 
match with data contained in a database, and a mechanism 
for cross-referencing a match, determined by the determin- 
ing mechanism, to a second database, to classify a source 
which published the web page. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects, aspects and advantages 
will be better understood from the following detailed 
description of a preferred embodiment of the invention with 
reference to the drawings, in which: 

FIG. 1 shows the process flow diagram of a geographi- 
cally bound resource discovery system including three main 
components of the invention (sometime referred to below as 
"MetroSearch") identified as MetroBot, IPLink, and 
YPLink; 

FIG. 2 depicts the IPLink flow chart, the process for 
identifying ISPs and Client Directory Paths; 

FIGS. 3A-3C are sub-processes of the IPLink flow chart 
shown in FIG. 2; 

FIG. 4 depicts the flow chart of YPLink for identifying 
business pages; 

FIG. 5 is a flow diagram for determining if a given 
uniform resources locator (URL) is a Root URL or a Leaf 
URL; and 

FIG. 6 is a template of a typical business home page. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT OF THE INVENTION 

65 Referring now to the drawings, and more particularly to 
FIG. 1, there is shown the general arrangement of a preferred 
embodiment according to the present invention. 
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The underlying insight behind the invention is that indi- portions. If it is a new domain 205, then its Web IP address 

viduals and organizations responsible for the design, (i.e., www.domain.name) is retrieved using the Internet 

creation, and maintenance of their home page generally Domain Name Service 122. The Unix nslookup(l) utility 

follow some basic unwritten rules. These rules can be 210 returns an IP address given a domain name. The 

exploited to automatically identify the owner of the home 5 corresponding IP address is stored in the ISP database 114. 

page with a high probability of success. Once the owner of A reverse lookup 210 of the Web IP address is also per- 

the home page is determined, an SIC code is assigned to it formed to determine 215 if the given URL is hosted on a true 

by looking up the owner in a Yellow Pages database. If a ( or virtual) Web server 220 or a shared Web server 225. A 

matching entry exists, then the owner is a business, other- domain name with its own unique Web IP address indicates 

wise the owner is deemed to be an individual with a personal 10 a true or Wrtual Web seivcr (non-ISP host). Multiple domain 

home page. names for a single Web IP address indicates a shared Web 

FIG. 1 shows a preferred architecture for implementing a server host)- . 

geographically bound resource discovery system. The main The official domam name ^ oot DomaiD ) 220 and 225 for 

components of interest are MetroBot 126, IPLink 113, and me IP addrcs ? *f lhe d ° main name of the IS ? (master/slave 

YPLink 112 15 name servcr information returned by whois(l) can also be 

>vl I j nr u n; l«\ 11 j • u -3 used to accurately identify the ISP if the Root Domain does 

The World-Wide Web ( the Web ) 124 is based on a , , , ' , T ' „ 4 „ , , - 

. u** . I,..,. ' , u * 1 w not correspond to the ISP). Root Domain is only used for 

client-server architecture. The Web is the graphical, multi- . . * TT , T . L . . ^ e ^ 

j- f*u 1 * miatl r 7 -j * displaying URL information on search results not for further 

media portion 01 the Internet 120. The client side program is r J . 

a Web browser 100 and the server side is a computer running processing. „ . ^ 

4t- irrrnn * M t, „ 7 u . ^ Tunung to FIG. 3 A, for shared servers 225, the Root Path 

the HTTPD program 102. The Web server is accessed 20 . ? \ . . _ A - ' 

4U u *u t * * u t • tt -r n is determined by searching 300 for the given domain name 

through the Internet by specifying a Uniform Resource . . vr T _ T ' 4 . * _ JjC ,. e 

t f rtTnn n \ j . & . . , , j in the New URL database 118 and finding common directory 

Locator (URL). User-entered queries are sent to a back-end . Tr A , . _ , TT „ T ... 

v ' . . J. ... paths 305. If no match is found 315, the URL will auto- 
processor or search engme 104 which gathers results from r „ , j * 1 * *• ***** * iL 

. j.u mriAo-iiA ^ i^o * *u matically be processed at a later iteration 230, otherwise the 

various databases 106, 108, 110, and 128, and formats the „ A _ f, . t . . . 

. * *u u t . -.r R° ot P at " 1S set to the matching path 310. 

request and presents them back to the user. 25 . . V t , n . n .« 

1* -. ^ . ^-.^ . . , . . . . . A . Turning to FIG. 3B, for virtual servers 220, the Root Path 

MetroBot 126 is an indexer robot which traverses hyper- • • u. a:^„^„, a p\ 

, . , ... , . is simply the root directory (7). Inese servers may or may 

^^Trtu content mto a not be i S Ps. If multiple domain names exist for the given IP 

searchable Web index database 128. These hyperlinks or address 320 ^ it fe classified . m lsp 325 otherwise it 

URLs point to other Web pages making it possible to fc at a , ater iteration 330 335 ^ 240 It ^ 

recursively traverse large portions of the Web from a single 30 for anizations to JSPs fa ^ fcture b 

well-chosen URL(seed URL). MetroBot begins its traverse s ^ ldding/hol(tilg new domain names on their existing 
from known Root URL 119 such as the home page of a local 

^ C6 KT Pr °^ d , 6r ( u P) ' Sl ? h 18 ^j 114611561 "™ P™^ 1 The directory path where the ISP stores its customers Web 

(ISP). New bnks that are discovered are stored^m New URLs ^ ^ , he , sp Direc path U6 

database 118 These links are processed by IPLmk 113 and * fa utf CT6ated manuall for a few local ISPs (seed ISPs) . 

YPLink 112 to extract new Root URLs at which point the ^ this idennfied automa ticaUy 335 by searching for the 

whole process repeats itself. Furthennore, YPLmk penodi- ^ ven domain naffle ^ the Root URL database U9 and 

cally supplements its New URL list by querying globid findin common dit ^ ^ ^ shoWQ ^ HG _ 3C 

search engines 121 using strategic keywords (e.g., regional , f no match fe found 3^ men ft ig ^.j at , , ater 

city, county, state names, zip codes, and industry specific 40 .^.^ ^ MatcMn ^ ^ tQ ^ ISPs ^ 



terms). 



Directory Path. This process improves over subsequent 



The first level of localization is achieved by hmiting iterations when enough data is gathered and patterns can be 

URLs to registered domain names 106. IPLink extracts rec ognized from a large set of ISP Web Servers, 

domain names from the New URL database and then queries l?Ulsk encompasses the first phase of identifying and 

the InterNIC database 122 where records of registered characterizing IP addresses. The next phase is to automati- 

domain names containing company name, contact, street cally idcntify businesses hosted on ISP Web servers, 

address, and Internet Protocol (IP) addresses are kept. This fig. 4 shows the YPLink flow chart. YPLink determines 

InterNIC database can be accessed through the Unix whois tf a Web page belongs to a business or m ^dividual. YPLink 

(1) command. YPLink merges the InterNIC address data- its mput? a URL? from r^Link. FIG. 4 shows the flow 

base 108 with the Yellow Pages data 110. This process is diagram for me YPUdk process. The first step after retriev- 

desenbed in detail below. ing a URL 400 is determining if it is a "Root URL" or a 

The next level of localization is more complex since most "Leaf URL" 405. 

businesses do not have their own registered domain name. a Root URL is the entry point for an organization's or 

Instead, they have their home page hosted on local SPs (or individual's home page on the World-Wide Web. A Root 

ISPs) or Online Service Providers (OSPs) Web Servers. URL may or may not be the same as the Home page. Leaf 

The first step in solving this problem is for IPLink 113 to URLs, on the other hand, are links below an organization's 

characterize URLs by their IP addresses. FIGS. 2 and Root URL. Four factors are considered in determining a 

3A-3C shows the IPLink flow logic. IPLink identifies the Root URL: 

following attributes based on the IP addresses of New URLs: fiQ l . Is the URL hosted on a Service Provider's Web Server? 

True/Virtual Web Servers vs. Shared Web Servers. 2. Is the URL on a virtual Web Server? 

ISP vs. Non-ISP hosts. 3, Does tne URL contain a directory path? 

Root Domain of URLs. 4. \ s tQe directory path a known Service Provider's Client 

Root Path of URLs. Directory? 

Client Directory Paths if host is an ISP. 65 IPLink determines the SP Client Directory Path as 

A new URL is retrieved from the New URL database 200 described above. The ISP database 114 contains information 

and is parsed into the domain name and directory path about Client Directories for various ISPs. 
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FIG. 5 shows the Root URL flow logic. A given URL is 
retrieved 500 and parsed into two components: domain 
name and directory path. The domain name is analyzed to 
see if it is an ISP 502. If multiple IP addresses are associated 
with the domain name, then the domain name is an ISP. If 
the domain name is not an ISP, then the directory path 
component is checked 504. A missing directory path signi- 
fies a Root URL 506, otherwise it is a Leaf URL 508. 

If the domain name is an ISP 510, then it is also a Root 
URL if no directory path exists 512. If a directory path exists 
514, then the path is compared to a list of known ISP Client 
Directory paths. No match 516 indicates a Leaf URL, 
otherwise the directory path level is analyzed 518 for final 
Root URL determination. If the path is one directory level 
below the Client Directory path then it is a Root URL 522, 
otherwise it is a Leaf URL 520. 

After a URL is determined to be Root URL, then the home 
page it points to is analyzed 415 to see if it follows some 
basic guidelines. A typical home page layout is illustrated in 
FIG. 6. Other than following HTML requirements, there is 
no rule or standards for the layout of textual content. The key 
pieces of information required to ascertain the owner of a 
Web page are 1) company name, 2) zip code, and 3) 
telephone number. These three pieces of information do not 
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The pair consisting of the company name and zip code are 
usually enough to identify a business 455. A query is 
constructed using this pair and sent to a Yellow Pages 
database server. This database is indexed by business names 
and zip codes. If a single match is found, then the resulting 
SIC code is assigned to the corresponding Root URL 460. If 
multiple entries are matched, then the phone field is also 
included in the query to assure that only a single entry is 
retrieved. If no match is found, then the URL is tagged 465 
for further analysis of lower-level hyperh'nks during the next 
iteration. The matching data is stored in an enhanced Yellow 
Pages database 108. 

If no match is found at any level, then the page is tagged 
450 as a personal page with an SIC code assigned according 
to the closest match based on the Business Semantic Ter- 
minology database 110. This database is a proprietary the- 
sauri of keywords relating business categories in the Yellow 
Pages and other emerging industries such as Internet tech- 
nology to extended SIC codes. 

While the invention has been described in terms of a 
single preferred embodiment, those skilled in the art will 
recognize that the invention can be practiced with modifi- 
cation within the spirit and scope of the appended claims. 

For example, while the invention above has been 
described primarily in terms of (e.g., implemented in) a 



have to exist in the Root URL. They can reside anywhere 25 software process and a system employing software and 
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among various Leaf URLs beneath a Root URL. In many 
cases, this information is stored in a file called abouthtml. 
However, the same information could be stored in other, 
similarly named files, as would be known to those skilled in 
the art taking the present specification as a whole. The 
process described below extracts this information automati- 
cally and assigns it to the Root URL being analyzed. 

The company's name is usually included in the HTML 
TITLE tag 600. However, the company's name could be 
included in other locations, as would be known to those 35 
ordinarily skilled in the art within the purview of the present 
specification. The layout of the address, if present, usually is 
in a standard recognizable format 602. Most businesses also 
tend to include copyright notices near the bottom of their 
documents. A string search for "copyright", "©", and 40 
"©" is performed near the bottom 604 of the home 
page. The company name usually appears near the copyright 
notice. A match of the organization or individual's name in 
the copyright field 420 and the TITLE field 425 provides the 
first indication of the owner of the home page. If no match 45 
is found, then the URL is tagged for further analysis during 
the next iteration. 

The next step is to analyze the URL for address 430 
information. Addresses have an easily identifiable format. In 
the U.S., the format is the city name followed by a comma 
and then followed by the full state name or abbreviation and 
finally a five or nine digit zip code. However, other common 
formats/methods also are possible and would be known to 
those ordinarily skilled in this art field to locate the zip code. 
This string is parsed in the HTML file after stripping all tags 55 
435. The only information required is the 5 -digit zip code 
since the city and state can be determined by this field alone. 
YPLink stores addresses associated with Root URLs and 
domain names in an address database 106. 

If a phone format field is present then it is also extracted 
and stored 440. U.S. phone field is a 10-digit field where the 
first three digits representing the area code are optionally 
enclosed in parentheses or separated by a dash, space, or a 
period, and then followed by a 7-digit number which is 
separated by a dash, space, or a period after the third digit 
445. Other similar methods of identifying a phone number 
are known to those ordinarily skilled in the art. 
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hardware, the invention could also be implemented with 
hardware as would be known by one of ordinary skill in the 
art taking the present specification as a whole. 
What is claimed is: 

1. A method of classifying a document published by a 
source on a portion of a network, comprising the steps of: 

electronically receiving a document; 

based on the document, determining a source which 
published the document; and 

assigning a code to said document based on whether data 
associated with the document published by the source 
matches with data contained in a database, 

wherein said portion of said network comprises a graphi- 
cal multimedia portion of said network, said source 
comprises a Web site publishing a home page, and said 
network comprises the Internet. 

2. The method according to claim 1, wherein said data- 
base comprises a Yellow Pages database. 

3. The method according to claim 1, wherein said graphi- 
cal multimedia portion of said network comprises the World- 
Wide Web (WWW) and said document comprises a Web 
document, and 

wherein said step of assigning a code comprises assigning 
a code that classifies the Web document as a first Web 
document type when there is a match of data associated 
with the Web document published by the Web site with 
said data contained in said database, and that classifies 
the Web document as a second Web document type 
when there is no match of data associated with the Web 
document published by the Web site with said data 
contained in the database. 

4. The method according to claim 3, wherein said data- 
base comprises a Yellow Pages database. 

5. The method according to claim 3, wherein the first Web 
document type is a business document and the second Web 
document type is a personal document. 

6. A method of classifying a document published by a 
source on a portion of a network, comprising the steps of: 

electronically receiving a document; 
based on the document, determining a source which 
published the document; and 
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assigning a code to said document based on whether data 
associated with the document published by the source 
matches with data contained in a database, 
wherein said step of determining a source includes: 
extracting a domain name from a predetermined uni- 
form resources locator (URL) database; 
querying a registered domain name database for storing 

registered domain names; and 
merging addresses from said registered domain name 
database with predetermined data. 

7, The method according to claim 6, wherein said prede- 
termined data comprises Yellow Pages data. 

8. The method according to claim 6, wherein said step of 
determining further comprises: 

parsing URLs from the predetermined URL database into 
domain name and directory path portions; and 

determining, based on the domain name, whether the 
URLs from the predetermined URL database are hosted 
on a true server or on a shared server. 

9, The method according to claim 8, wherein the step of 
determining further comprises: 

attempting to determine a root path for each URL hosted 
on a shared server. 

10, A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 

language (HTML) at a web address and searching for 

criteria contained therein; 
means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

page at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said criteria include at least one of an address, a 

telephone number, a facsimile number, a contact and a 

key-word contained in said HTML, and 
wherein the characteristics of said web page include a 

geographical location and a web page host computer. 

11. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 
language (HTML) at a web address and searching for 
criteria contained therein; 

means for analyzing a uniform resources locator (URL) of 
the web address to determine characteristics of a web 
page at the web address; 

means for determining whether said criteria match with 
data contained in a database; and 

means for cross-referencing a match, determined by said 
determining means, to a second database to classify a 
source which published the web page, 

wherein said second database includes a Business Seman- 
tic Terminology database having information related to 
business categories in a Yellow Pages directory. 

12. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 
language (HTML) at a web address and searching for 
criteria contained therein; 
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means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

page at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said second database includes a Yellow Pages 

database. 

13. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 

language (HTML) at a web address and searching for 

criteria contained therein; 
means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

pare at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said web page comprises hyperlinks, and said 

means for parsing comprises an indexer robot for 

traversing said hyperlinks in said web page and a web 

page index database, 
said indexer robot for indexing a content of said web page 

into said web index database. 

14. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 

language (HTML) at a web address and searching for 

criteria contained therein; 
means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

page at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said means for analyzing comprises: 

means for determining whether said URL comprises 
one of a root URL and a leaf URL. 

15. A search engine according to claim 14, wherein said 
root URL comprises an entry point for the web page on the 
World-Wide Web, and a leaf URL comprises a link below a 
root URL, said search engine further comprising: 

means for parsing said URL into a domain name compo- 
nent and a directory path component; 

means for analyzing the domain name in said domain 
name component to determine whether it is associated 
with a service provider (SP); 

means for checking the directory path component to judge 
whether a directory path is missing, when the domain 
name is not associated with an SP, a missing directory 
path indicating a root URL, and for checking whether 
a directory path does not exist to thereby determine that 
said domain name comprises a root URL, when the 
domain name is associated with an SP; 

means for comparing the path to known SP Client Direc- 
tory paths, when a directory path exists; 
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means for analyzing a home page associated with said 
root URL, when said URL is determined to be a root 
URL, thereby automatically to extract home page data 
contained therein; and 

means for assigning the home page data to the Root URL 
being analyzed. 

16. A method of indexing textual content on the world- 
wide web, comprising: 

robotically traversing the world-wide web to identify 
uniform resource locators; and 

determining whether the identified uniform resource loca- 
tors are associated with a business or an individual, 

wherein the determining step comprises: 

extracting ownership data from content associated with 
the identified uniform resource locators; 



1,289 

10 

querying a business listing database based on the 
ownership data; and 

determining that the identified uniform resource loca- 
tors are associated with businesses if the querying 
matches the ownership data to a business listing in 
the business listing database. 

17. The method according to claim 16, further compris- 
ing: 

i assigning business category codes to the uniform resource 
locators associated with businesses. 

18. The method according to claim 17, wherein the 
business category codes are the Standard Industrial Classi- 
fication (SIC) codes. 
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[57] ABSTRACT 

Embodiments of the present invention use a new extension 
to the HTML language to support remotely specified named 
anchors. A remotely specified named anchor, when embed- 
ded within a source document, instructs a browser program 
to access a portion of a destination document indicated in the 
remotely specified named anchor. When the browser pro- 
gram reads a remotely specified named anchor such as 

<a hre£4itipy/£c» xomA)arJiliiU/SCRCaJL=*So^ TexT> 

from the source document, the browser program performs 
the following steps: 1) the browser retrieves the destination 
file "bariitml" from the server 'Yoo.com", 2) the browser 
searches the file bar Jitml for "Some Text", and 3) if the 
browser finds the character swing being searched for, then 
the browser displays the file banhtml, scrolled to the line 
containing the first character of the character string being 
searched for. 
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<!— HEADJ5TART — > 

<HTML> 

<HEAD> 

<TITLE>Sun MicroBystems</TITLE> 

<! — META NAME- "owner" VALUE="hooper0bcci .eng.sun.com " — > 
</HEAD> 

<BODY> 

<!— HEAD_END — > 



<A HREF- B /cgi-bin/imagemap/960101/homepage.9601.map"><IMG BORDER- 0 SRC- ■ /share/ i 
mage s/ homepage . 9601 .color. 580x576.gif" ALT= p Highly graphic homepage" ISHAPx/A>< 
P> 

Sun Microsystems <A HREF ="/960101/ index . textonly . html w > text- only< / A> home page. 

<!— F0OT_ START — > 
<HR> 

<F0NT SIZE=2> Questions or comments regarding this service? 

<A HREF=Vcgi-bin/comment-forni.pl?/960101/ index. html ' xEM>webmaster@ sun. com</EM> 

</Ax/FONT> 

<P> 

<H5xA HREFs'/share/text/SMIcopyright.html'^opyri^i^^ 1996 Sun Microsystems, 
Inc., 2550 Gracia Ave., Mtn View, CA 94043-1100 USA. All Rights Reserved</H5> 
</B0DY> 
</HTML> 

<! — FOOT_END — > 



FIG. 2 



<html> 
<head> 
<title> 

</title> 
</head>. 

<! — this is a comment — > 
<body> 

<address> 

</address> 

</body> 

</html> 



FIG. 3 
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METHOD AND SYSTEM FOR 
IMPLEMENTING HYPERTEXT SCROLL 
ATTRIBUTES 

FIELD OF THE INVENTION 

Aspects of the present invention provide a method and 
system for remotely specifying which section of a hypertext 
document to display on a user's computer. 

BACKGROUND OF THE INVENTION 

HTML is a •markup" language which allows an author to 
turn a simple text document into a hypertext document for 
the World Wide Web (**the web M ). FIG. 1 is an example of 
a hypertext document from Sun Microsystems as viewed 
through a browser from Netscape Communications, Inc. 
FIG. 2 illustrates the HTML source code which describes the 
hypertext document of FIG. 1. 

The HTML markup language is analogous in some ways 
to the formatting codes used in word processing documents. 
A word processing document viewed through a word pro- 
cessing program is actually a combination of the text that 
you see and a series of hidden formatting codes (e.g., 
carriage return, bold, underline) which instruct the word 
processing program to display the word processing docu- 
ment in a specified way. Similarly, a hypertext document is 
actually a combination of the text that you see and a series 
of hidden *tags M or "anchors" (for new paragraphs, graphics 
images, hypertext links, etc) which instruct the browser 
program to display the hypertext document in a specified 
way. 

A hypertext document is usually broken down into 
sections, with each section delineated by one or more HTML 
tags. HTML tags are formatting codes surrounded by the 
characters < and > (less than and greater than symbols). 
Some HTML tags have a start tag and an end tag. In general, 
end tags are in the format </"symboT*> where the "symbol" 
is the character string found between the characters- < and > 
in the start tag. FIG. 3 is an example of a series of HTML 
document tags forming a template for a typical hypertext 
document For example, the document of FIG. 3 is defined 
as an HTML document using the tags <html> and 
<yhtml>Then the "head** to the document, which typically 
includes a title, is defined using the tags <head>, </head>, 
<tMe>, and </title>, respectively. Following the head comes 
the "body** of the document which is often organized into 
subtopics with different levels of headings. The body is 
defined by the tags <body> and </body>. Headings are 
indicated by the tags <h#> and </h#>, where #is the level of 
the heading. Heading levels indicate the relative size of the 
heading. Heading level 1 is the largest heading size and 
heading level 6 is the smallest heading size. Finally, it is 
good practice to indicate the author of the document at the 
bottom of the document using the tags <address> and 
</address>. FIG. 4 summarizes this information in a table 
format 

Once the HTML template has been established, text is 
added to create a basic hypertext document In order to 
improve readability, the author adds HTML character and 
paragraph formatting tags to the document For example, the 
<p> tag instructs the browser to begin a new paragraph. If an 
author wants to highlight some text in bold, the author 
inserts the <b> tag at the beginning of the text to be 
highlighted and inserts a </b> tag at the end of the text to be 
highlighted. The tags <i> and </i> indicate text to display in 
italics. FIG. 5 illustrates additional tags for formatting 
characters and paragraphs. 



59,729 

2 

If HTML was merely made up of the document, 
paragraph, and character formatting tags discussed above, it 
would only allow an author to define a document which 
stands by itself. Fortunately, additional HTML tags allow an 

5 author to "link** documents together. If a reader of a hyper- 
text document wants to know more about a topic before 
reading the rest of the current hypertext document, the 
reader selects a *Tink" or "hot link", which retrieves and 
displays a new document that provides related information. 

w FIG. 6 illustrates a hypertext document (i.e, a "source 
document") on Thomas Jefferson with a hot link named "the 
American Constitution". The link could take the reader to a 
second hypertext document (Le., a "destination document") 
which, for example, displays the text of the American 
Constitution or which provides more information on Tho- 

15 mas Jefferson's role in the drafting of the American Con- 
stitution. 

In HTML, a hot link to a destination document is made by 
placing a ''reference anchor" around the text to be high- 
lighted (e.g., "the American Constitution**) and then provide 
20 ing a network location where the destination document is 
located. Reference anchors extend the idea of start and end 
tags. Areference anchor is created when the start tag <a> and 
the end tag </a> are placed around the text to be highlighted 
(e.g., <a> the American Constitution </a>). Then attribute 
25 information that identifies the network location of the des- 
tination document is inserted within the <a> reference tag. In 
HTML, the "href= n attribute, followed by the network 
location for the destination document is inserted within the 
<a> tag. For example, 
30 <a href=^*network location far the destination document n > 
the American Constitution </a>Qlustrates the basic 
format for a reference anchor. On the web, network 
locations of hypertext documents are provided using 
the Universal Resource Locator CURL**) naming 
35 scheme. FIG. 7 illustrates the primary components of a 
URL. 

A service type 701 is a required part of a URL. The 
service type tells the user's browser how to contact the 
server for the requested data. The most common service type 

40 is the Hyperlext Transport Itotocol or http. The web can 
handle several other services including gopher, wais, ftp, 
netnews, and telnet and can be extended to handle new 
service types. A system name 709 is also a required part of 
a URL The system name is the fully qualified domain name 

45 of the server which stores the dam being requested. A port 
705 is an optional part of a URL. Ports are the network 
socket addresses for specific protocols. By default, http 
connects at port 90. Ports are only needed when the server 
does not communicate on the default port for that service. A 

so directory path 707 is a required part of a URL. Once 
connected to the system in question, a path to the file must 
be specified. A filename 709 is an optional part of a URL. 
The file name is the data file itself. The server can be 
configured so that if a filename isn't specified, a default file 

55 or directory listing is returned. A search component 711 is 
another optional part of a URL. If the URL is a request to 
search a data base, the query can be embedded in the URL. 
The search component is the text after the ? or #in a URL. 
Substituting the URL *1ittp^/system/diiv1ilc.html n into the 

60 example above, the reference anchor 

<a hrcf^httpy/systenydii/fileJitmir> the America Constitution 
</a> 

identifies an html file to retrieve and display when a user 
65 selects "fie American Constitution" hot link. 

Sometimes an author may want to direct the reader's 
attention not to the destination document as a whole but to 
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a specific part of the destination document. For example, 
instead of pointing the reader to the beginning (i.e., the 
Preamble) of the American Constitution, an author may 
want to point the reader directly to the 10th Amendment 
(ie., Article X) of the American Constitution. Hypertext 
links that point to a specific point in a destination document 
are known as named anchors. Named anchors are essentially 
modified reference anchors. Continuing with the example 
above, if an author wants to point to the section on the 10th 
Amendment within a destination document containing 
HTML source code for the entire American Constitution, 
then the author follows a two step process. First the author 
modifies the HTML source code for the destination docu- 
ment by inserting a "NAME" attribute within an <a> tag 
which is inserted before the start of the section on the 10th 
Amendment For example, the tag 

<a NAME=" 10th Amendments Article X </a> 

could be inserted into the destination document's HTML, 
source code before the start of the section on Article X. To 
reference this point, the author of the source document 
creates a named anchor in the source document which uses 
a #character to reference the" 10th Amendment" NAME 
attribute in the destination document For example, the 
named anchor: 

<a hicf^http^/systcm/diiyfilcJitmW 10th Amendment^ the 10th 
Amendment <fa> 

identifies the section on Article X as the section to retrieve 
and display when a user selects the hot Hnk "the 10th 
Amendment". 

An implicit assumption of the example set forth above is 
that the author of the source document has permission to edit 
and modify the destination document in order to add a 
"NAME" attribute before the section on Article X. At the 
very least the author of the source document has to be able 
to convince the author of the destination document to add 
such a "NAME" attribute before the section on Article X. 
However, since the web is a distributed, network-based 
hypertext system, the author of the source document may in 
fact not have access to the destination document. Thus, it 
would be beneficial to provide a method and system, which 
allows browsers to automatically display sections of desti- 
nation documents, even though those sections do not include 
embedded NAME attributes. 

SUMMARY OF THE INVENTION 

Embodiments of the present invention use a new exten- 
sion to the HTML language to support remotely specified 
named anchors. A remotely specified named anchor, when 
embedded within a source document, instructs a browser 
program to access a portion of a destination document 
indicated in the remotely specified named anchor. One 
benefit of the present invention over previously imple- 
mented named anchors is that embodiments of the present 
invention provide this functionality even when the indicated 
portion of the destination document does not contain a 
4 *NAME" attribute. In this way, an author of a source 
document can create a hot Hnk which scrolls to an indicated 
portion of a destination document even though the author of 
the source document is unable to modify, or have modified, 
the source code of the destination document to include a 
"NAME** attribute. 

In one embodiment, when the browser program reads a 
remotely specified named anchor such as: 

<a hrc6=http^/foo.com^aiitml SCROLL^Some Text*> 
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the browser program performs the following steps: 1) the 
browser retrieves the file "bathtrnT from the server 
**fco.conT,2) the browser searches the file bar.html for 
"Some Text* , and 3) if the browser finds the character string 
s being searched for, then the browser displays the file bar- 
.html scrolled to the line containing the first character of the 
character string being searched for. 

The present invention also provides graceful degradation 
to support legacy browsers. If a remotely specified named 
10 anchor such as: 

<a href=httpy/foo.com/barJitm] SCROLI^Some Tbxt"> 

is read by a browser program which does not support the 
new HTML extension of the present invention then the 
15 legacy browser will simply ignore the SCROLL attribute 
and will instead display the destination file bathtml in the 
normal fashion, i.e., scrolled to the top of the file. 

NOTATIONS AND NOMENCLATURE 

20 The detailed descriptions which follow are presented 
largely in terms of methods and symbolic representations of 
operations on data bits within a computer. These method 
descriptions and representations are the means used by those 
skilled in the data processing arts to most effectively convey 

25 the substance of their work to others skilled in the art 
A method is here, and generally, conceived to be a 
self-consistent sequence of steps leading to a desired result 
These steps require physical manipulations of physical 
quantities. Usually, though not necessarily, these quantities 

30 take the form of electrical or magnetic signals capable of 
being stored, transferred, combined, compared, and other- 
wise manipulated. It proves convenient at times, principally 
for reasons of common usage, to refer to these signals as 
bits, values, elements, symbols, characters, terms, numbers, 

35 or the like. It should be bourne in mind, however, that all of 
these and similar terms are to be associated with the appro- 
priate physical quantities and are merely convenient labels 
applied to these quantities. 
Useful machines for performing the operations of the 

40 present invention include general purpose digital computers 
or similar devices. The general purpose computer may be 
selectively activated or reconfigured by a computer program 
stored in the computer. A special purpose computer may also 
be used to perform the operations of the present invention. 

45 In short, use of the methods described and suggested herein 
is not limited to a particular computer configuration. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is an example of a hypertext document from Sun 
50 Microsystems as viewed through a browser from Netscape 
Communications, Inc. 

FIG. 2 illustrates HTML source code which describes the 
hypertext document of FIG. 1. 
55 FIG. 3 is an example of a series of HTML document tags 
farming a template for a typical hypertext document 

FIG. 4 summarizes information regarding HTML docu- 
ment tags. 

FIG. 5 summarizes information regarding HTML charac- 
m ter and paragraph tags. 

FIG. 6 illustrates a hypertext document on Thomas Jef- 
ferson with a hot link named **the American Constitution". 

FIG. 7 illustrates the primary components of a Universal 
Resource Locator ("URL"). 
65 FIG. 8 is a block diagram of a computer system for 
practicing the preferred embodiment of the present inven- 
tion. 



04/14/2003, EAST Version: 1.03.0007 



5,6: 

5 

FIG. 9 is a flow diagram which illustrates the preferred 
steps taken to access a portion of a destination document 
identified in a remotely specified named anchor, even when 
the destination document does not contain a "NAME" 
attribute. 

DETAILED DESCRIPTION 

Overview Of The Preferred Method 

Embodiments of the present invention use a new exten- 
sion to the HTML language to support remotely specified 
named anchors. A remotely specified named anchor, when 
embedded within a source document, instructs a browser 
program to access a portion of a destination document 
indicated in the remotely specified named anchor. When the 
browser program reads a remotely specified named anchor 
such as: 

<a hrcf^mp^/foorom/bariilinl SCROIi=**Some Tcxf"> 

from the source document, the browser program performs 
the following steps: 1) the browser retrieves the destination 
file "barhrmT from the server "foo.com", 2) the browser 
searches the file barJitml for "Some Text", and 3) if the 
browser finds the character string being searched for, then 
the browser displays the file barJtfrnl, scrolled to the line 
containing the first character of the character string being 
searched for. 

One benefit of the present invention over previously 
implemented named anchors is mat embodiments of the 
present invention provide mis functionality even when the 
indicated portion of the destination document does not 
contain a "NAME" attribute. In this way, an author of a 
source document can create a hot link which scrolls to an 
indicated portion of a destination document even though the 
author of the source document is unable to modify, or have 
modified, the source code of the destination document to 
include a "NAME" attribute. 
Overview Of The Preferred System 

FIG. 8 is a block diagram of a computer system 800 for 
practicing the preferred embodiment of the present inven- 
tion. The computer system 800 includes a user computer 
891, a source document server computer 803, a destination 
document server computer 805, and a network 
uons mechanism 807. 

The user computer 801 includes a processor 809, a 
memory 811, and an interface 813 for facilitating input and 
output in the user computer 801. Ihe memory 811 stores a 
number of items, including a browser 815, and an operating 
system 817. The preferred browser is a Java™ enabled 
browser such as Hot Java™ from Sun Microsystems, Inc., of 
Mountain 'View, Calif. 1 The preferred operating system is 
the Solaris™ operating system from Sun Microsystems, Inc. 
1. Sua and Solaris axe trademarks or registered trriexnarics of Sun Micro- 
systems, Inc., in the United States and other countries. 

The source document computer 803 includes a processor 
819, a memory 821, and an interface 823 for facilitating 
input and output in the source computer 803. The memory 
821 stores a number of items, including a source document 
825, and an operating system 827. The preferred operating 
system is the Solaris™ operating system from Sun Micro- 
systems, Inc. of Mountain View, Calif. 

The preferred source document is a text document inter- 
spersed with constructs of the HTML markup language. 
Another possibility would be a text document marked up 
with SGML (Standard Generalized Markup Language). In 
general, mis embodiment does not require that the source 
document is encoded in HTML, it is preferred, however, That 
the document contain one or more URLs. For example, this 
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patent application could be a source document, and in fact 
it would be quite convenient to be able to refer in this patent 
application to specific examples of web pages, hot links, and 
reference anchors. If the source document is not encoded in 
5 HTML, SGML, or some other standard format, it becomes 
mare difficult, but certainly cot impossible, to recognize the 
URLs. 

This embodiment of the invention does rely on the 
information being represented as text, though there is no 

ao requirement that the text be encoded in ASC3L For use with 
other languages, text may be encoded in Unicode (the 
preferred embodiment for non-European languages) or any 
other text encoding scheme that has the simple property of 
allowing the computer to compare a string with a substring 

15 of the entire file and determine whether the string is identical 
to the substring. 

The destination document computer 805 includes a pro- 
cessor 829, a memory 831, and an interface 833 far facili- 
tating input and output in the destination computer 805. The 

20 memory 831 stores a number of items, including a destina- 
tion document 835, and an operating system 837. The 
preferred destination document is a text document inter- 
spersed with constructs of the HTML markup language. The 
preferred operating system is the Solaris™ operating system 

25 from Sun Microsystems, Inc. of Mountain View, Calif. The 
network communications mechanism 807 provides a mecha- 
nism for facilitating communication between the user com- 
puter 801, the source document server 803, and the desti- 
nation document server 805. 

30 It should be noted that the user computer 801, the source 
document server 803, and the destination document server 
805 may all contain additional components not shown in 
FIG. 8. For example, each computer could also include some 
combination of additional components including a video 

35 display device, an input device, such as a keyboard, mouse, 
or pointing device, a CD-ROM drive, and a permanent 
storage device, such as a disk drive. 

DETAILED DESCRIPTION OF THE 
40 PREFERRED EMBODIMENTS 

. The preferred operation of the system in FIG. 8 is perhaps 
best described by way of example. FEG. 9 is a flow diagram 
which illustrates the preferred steps taken to access a portion 
of a destination document identified in a remotely specified 

45 named anchor, even when the destination document does not 
contain a "NAME" attribute. First, a browser program reads 
a remotely specified named anchor from a source document 
Then the browser parses the remotely specified named 
anchor and retrieves the name of the file to access, and the 

50 network location of the server mat stores the file. After 
retrieving the file, the browser searches the file for the 
indicated text If the indicated text is found, then the browser 
displays the file starting at the first character of the indicated 
text 

55 In step 901 the browser retrieves a source document 
Topically, the source document will be identified by a URL 
supplied by the user in an "Open FQe n dialog box displayed 
by the browser. In step 903 the browser displays the source 
document on the user's computer. In step 905 the browser 

£0 receives input by the user on the displayed source document 
In step 907 the browser determines whether the user selected 
a hot link containing a remotely specified named anchor If 
the user requested an operation other than selecting a hot 
link containing a remotely specified anchor then the browser 

65 merely performs the requested operation using techniques 
available in the prior art (step 909). If, however, the user did 
select a hot link containing a remotely specified named 
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anchor men, in steps 911 through 923, the browser processes 
the remotely specified named anchor in order to access a 
specified portion of a destination document identified in the 
remotely specified named anchor, even though the specified 
portion of the destination document does not contain an 
HTML "NAME" attribute associated with it. 

In step 911 the browser parses the remotely specified 
named anchor to obtain the name of the file to retrieve, as 
well as the name of the server storing the file. In step 913 the 
browser retrieves the file from the server. In step 915 the 
browser parses the remotely specified named anchor in order 
to retrieve the character string for which to search. Those of 
ordinary skill in tins art will understand that, alternatively, 
the browser could, in step 911, parse the remotely specified 
named anchor to retrieve the character string to search for. 
In step 917 the browser searches the retrieved file for the 
character string. If the character string is not found in the file 
(step 919) then the retrieved file is displayed to the user 
starting at the top of the file (step 921). If, however, the 
character string is found in the retrieved file then the browser 
displays the file scrolled to the line containing the first 
occurrence of the character string being searched for (step 
923). 

In this way, a new method and system are provided which 
access a portion of a destination document identified in a 
remotely specified named anchor, even when the destination 
document does not contain a "NAME" attribute. 

One weakness of the preferred embodiment is that it does 
not allow the author of the source document to point to a 
second occurrence of a character string. Consider as an 
example the following file: 

1. Socrates was a man. 

2. All men are mortal. 

3. Therefore, Socrates was mortal. 

4. So his ultimate downfall was due to the fact that 
Socrates was a man. For example, the preferred 
embodiment will not linV to the character string 
"Socrates was a man" in line 4 in response to the 
remotely specified named anchor 

<a lucT=liltpV/foo^oiE/ socratestaryiitoil SCROLL:"Socrales was a 

because the preferred embodiment will instead scroll to the 
first occurrence of the string "Socrates was a man*' in line 1 
of the file. This limitation is not severe, however, since it will 45 
normally be the case that the author can merely keep adding 
to the character string of choice until it is uniquely identified. 
For example, if it indeed was desired to link to the occur- 
rence of "Socrates was a man" in line 4, then the author 
could merely search for the string "that Socrates was a man". 50 
In this way, the preferred embodiment would scroll the file 
to line 4, as desired. Though not a perfect solution, this 
solution will be adequate in almost all cases. 

In general, embodiments of the invention apply to any 
system where it is desired to be able to point to a specific part 55 
of a larger whole even when one cannot get access to mis 
larger whole to insert a reference marker. 

There is an alternative, much simpler, way of solving one 
aspect addressed by the present invention, but it is *not* 
recommended. One could simply point to the character 60 
position (offset) of the desired scroll within the destination 
file. The reason this is not recommended is that the character 
position will change if the owner of the destination file edits 
it Editing the file may also change the search siring but mis 
is much less likely to happen than the more common case 65 
where the author adds or deletes text in another part of the 
document 
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While specific embodiments have been described herein 
for purposes of illustration, various modifications may be 
made without departing from the spirit and scope of the 
invention. Accordingly, the invention is not limited to the 
above described embodiments, but instead is defined by the 
claims which follow, along with their full scope of equiva- 
lents. 

What is claimed is: 

1. A method executed in a network computer system for 
facilitating access to a specified portion of data stored at a 
remote location, the method comprising the steps of: 

retrieving a source document, the source document 
including hypertext links to other data on the network; 

displaying the source document; 

receiving input entered on the source document; 

determining whether the input comprises selection of a 
remotely specified named anchor, 

when the input comprises selection of a remotely speci- 
fied named anchor, retrieving data indicated in the 
remotely specified named anchor and displaying a 
portion of the data specified in the remotely specified 
named anchor, wherein the specified portion of the data 
does not have a position marker associated with it 

2. Hie method of claim 1 wherein the step of determining 
further comprises the step of: 

exarnining the remotely specified named anchor to deter- 
mine whether it contains an attribute indicating that a 
specified portion of the retrieved data should be dis- 
played. 

3. The method of claim 2 wherein the attribute is a 
SCROLL attribute. 

4. The method of claim 1 wherein the step of displaying 
the portion of the data specified in the remotely specified 
named anchor further comprises the step of: 

examining the remotely specified named anchor to deter- 
mine a character string to search for, 

5. The method of claim 4 further comprising the steps of: 
searching the retrieved data for the character string; and 
displaying the portion of the data containing the character 

string. 

6. A network computer system for facilitating access to a 
specified portion of data stored at a remote location, the 
system comprising: 

a mechanism configured to retrieve a source document, 

the source document including hypertext links to other 

data on the network; 
a mechanism configured to display the source document; 
a mechanism configured to receive input entered on the 

source document; 
a mechanism configured to determine whether the input 

comprises selection of a remotely specified named 

anchor; 

a mechanism configured to, when the input comprises 
selection of a remotely specified named anchor, retrieve 
data indicated in the remotely specified named anchor 
and display a portion of the data specified in the 
remotely specified named anchor, wherein the specified 
portion of the data does not have a position marker 
associated with it 

7. The system of claim 6 wherein the mechanism config- 
ured to determine further comprises: 

a mechanism configured to examine the remotely speci- 
fied named anchor to determine whether it contains an 
attribute indicating that a specified portion of the 
retrieved data should be displayed. 
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8. The system of claim 7 wherein the attribute is a 
SCROLL attribute. 

9. The system of claim 6 wherein the mechanism config- 
ured to display the portion of the data specified in the 
remotely specified named anchor further comprises: 5 

a mechanism configured to examine the remotely speci- 
fied named anchor to determine a character string to 
search for. 

10. The system of claim 9 further comprising: 

a mechanism configured to search the retrieved data for 10 

the character string; and 
a mechanism configured to display the portion of the data 

containing the character string. 

11. A computer program product for facilitating access to 15 
a specified portion of data stored at a remote location, the 
computer program product comprising: 

code mat retrieves a source document, the source docu- 
ment including hypertext links to other data on the 
network; 20 

code that displays the source document; 

code that receives input entered on the source document; 

code mat determines whether the input comprises selec- 
tion of a remotely specified named anchor; 

code that, when the input comprises selection of a 
remotely specified named anchor, retrieves data indi- 
cated in the remotely specified named anchor and 



25 



displays a portion of the data specified in the remotely 
specified named anchor, wherein the specified portion 
of the data does not have a position marker associated 
with it 

wherein the code resides on a tangible medium, 

12. The computer program product of claim 11 wherein 
the code that determines further comprises: 

code that examines the remotely specified named anchor 
to determine whether it contains an attribute indicating 
that a specified portion of the retrieved data should be 
displayed. 

13. The computer program product of claim 12 wherein 
the attribute is a SCROLL attribute. 

14. The computer program product of claim 11 wherein 
the code that displays the portion of the data specified in the 
remotely specified named anchor further comprises: 

code that examines the remotely specified named anchor 
to determine a character string to search for. 

15. The computer program product of claim 14 further 
comprising: 

code that searches the retrieved data for the character 
string; and 

code that displays the portion of the data containing the 
character string. 
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ABSTRACT 



A search function is provided for Internet searching capable 
of searching to greater depth than conventional search 
functions. The new function tests returned electronic docu- 
ments from a first search for a second search function, and, 
finding a second function, transfers at least a form of first 
search criteria into the second search function, then initiated 
the second function, and returns at least addresses of docu- 
ments found by the second function into the first function. In 
a preferred embodiment a search function according to the 
invention is provided by a subscription portal server, and 
operates by proxy, initiated and controlled by subscribers. In 
this form, primary searches may be limited to destinations 
registered to specific subscribers using the function. 

11 Claims, 11 Drawing Sheets 
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METHOD AND APPARATUS FOR 
EXTENDING AN ON-LINE INTERNET 
SEARCH BEYOND PRE-REFERENCED 
SOURCES AND RETURNING DATA OVER A 
DATA-PACKET-NETWORK (DPN) USING 
PRIVATE SEARCH ENGINES AS PROXY- 
ENGINES 

CROSS-REFERENCE TO RELATED 
DOCUMENTS 

The present invention is a continuation in part (QP) to 
patent application Ser. No. 09/323,598 entitled "Method and 
Apparatus for Obtaining and Presenting WEB Summaries to 
Users" filed on Jun. 1, 1999, which is a CIP to patent 
application Ser. No. 09/208,740 entitled "Method and Appa- 
ratus for Providing and Maintaining a User-Interactive Por- 
tal System Accessible via Internet or other Switched-Packet- 
Network" filed on Dec. 8, 1998, disclosures of which are 
incorporated herein in their entirety by reference. 

FIELD OF THE INVENTION 

The present invention is in the field of Internet navigation 
and data gathering over a DPN network and pertains more 
particularly to a method and apparatus for searching for and 
returning data associated with URL's not indexed in tradi- 
tional search engine databases, by using secondary proxy 
engines. 

BACKGROUND OF THE INVENTION 

The information network known as the World Wide Web 
(WWW), which is a subset of the well-known Internet, is 
arguably the most complete source of publicly accessible 
information available. Anyone with a suitable Internet appli- 
ance such as a personal computer with a standard Internet 
connection may access (go on-line) and navigate to Univer- 
sal Resource Locators (URL's), also termed information 
pages or WEB pages, stored on Internet-connected servers 
for the purpose of garnering information and initiating 
transactions with hosts of such servers and pages. 

Many companies offer various subscription services 
accessible via the Internet. For example, many people now 
do their banking, stock trading, shopping, and so forth from 
the comfort of their own homes via Internet access. 
Typically, a user, through subscription, has access to per- 
sonalized and secure WEB pages for such functions. By 
typing in a user name and a password or other personal 
identification code, a user may obtain information, initiate 
transactions, buy stock, and accomplish a myriad of other 
tasks. 

One problem that is encountered by an individual who has 
several or many such subscriptions to Internet-brokered 
services is that there are invariably many passwords and/or 
log-in codes to be used. Often a same password or code 
cannot be used for every service, as the password or code 
may already be taken by another user. A user may not wish 
to supply a code unique to the user such as perhaps a social 
security number because of security issues, including quality 
of security, that may vary from service to service. 
Additionally, many users at their own volition may choose 
different passwords for different sites so as to have increased 
security, which in fact also increases the number of pass- 
words a user may have. 

Another issue that can plague a user who has many 
passworded subscriptions is the fact that they must book- 
mark many WEB pages in a computer cache so that they 
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may quickly find and access the various services. For 
example, in order to reserve and pay for airline travel, a user 
must connect to the Internet, go to his/her book-marks file 
and select an airline page. The user then has to enter a user 

5 name and password, and follow on-screen instructions once 
the page is delivered. If the user wishes to purchase tickets 
from the WEB site, and wishes to transfer funds from an 
on-line banking service, the user must also look for and 
select the personal bank or account page to initiate a funds 

10 transfer for the tickets. Different user names and passwords 
may be required to access these other pages, and things get 
quite complicated. 

Although this preceding example is merely exemplary, it 
is generally known that much work related to finding WEB 

15 pages, logging in with passwords, and the like is required to 
successfully do business on the WEB. 

A service known to the inventor and described in the 
related case listed under the cross-reference to related docu- 
ments section provides a WEB service that allows a user to 

20 store all of his password protected pages in one location such 
that browsing and garnering information from them is much 
simplified. A feature of the above service allows a user to 
program certain tasks into the system such that requested 
tasks are executed by an agent (software) based on user 

25 instruction. The service stores user password and log-in 
information and uses the information to log-in to the user's 
sites, thus enabling the user to navigate without having to 
manually input log-in or password codes to gain access to 
the links. 

30 The above-described service uses a server to present a 
user-personalized application that may be displayed as an 
interactive home page that contains all of his listed sites 
(hyperlinks) for easy navigation. The application lists the 
user's URL's in the form of hyperlinks such that a user may 
click on a hyperlink and navigate to the page wherein log-in, 
if required, is automatic, and transparent to the user. 

The application described above also includes a software 
agent that may be programmed to perform scheduled tasks 

40 for the user including returning specific summaries and 
updates about user-account pages. A search function is 
provided and adapted to cooperate with the software agent 
to search user-entered URL's for specific content if such 
pages are cached somewhere in their presentable form such 

45 as at the portal server, or on the client's machine. 

An enhancement to the personalized system described 
above allows a software agent termed a gatherer agent 
(browser navigation control) to, in cooperation with a search 
function, navigate by proxy to any user-entered URL and 

50 return updated data back to the user in the form of an HTML 
information pace, which appears in the user's browser 
window. The enhancement is accomplished with the use of 
site-logic scripting based on pre- known information about 
the URL or URL's from which a user wishes to obtain data. 

55 In this way, current data specifically requested by a user may 
be found and retrieved for the user. 

The process described above is initiated by a user query 
that is entered into a search function dialog box provided 
with a user's personal portal page. The query may be 

60 presented in natural language adding a level of user friend- 
liness to the process. Moreover, auto log-in to password 
protected sites may be performed on behalf of users by 
virtue of the system's compilation and storage of user and 
WEB-site related data. 

65 A limitation exists in the personalized system described 
above in that the search function may not search beyond the 
indexed or known URLs listed in the service database and 
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attributed to a requesting user. That is to say that the search site-logic is known pertaining to how data is hosted at the 

function cannot proceed beyond the first level of WEB site service-site marked by a known URL, the limitation 

depth. Therefore, manual navigation must still be performed described in a general data-search system still exists. That is, 

by a user who desires to obtain data referenced at a deeper URLs may not be found if they are not pre-known or 

level than in an indexed or registered URL. Of course, a user 5 indexed. 

practicing the above system may physically register any new What is clearly needed is a method and apparatus that 

URLs with the service such that they may be included in the enables a search engine to find and obtain data from URLs 

search criteria and site logic may be developed for obtaining that are not indexed by a search engine database or otherwise 

summary data contained in the new URLs. However, when pre-known to a requesting user or search -hosting service. A 

performing a general search, a user may not know the URL 1Q metnod and apparatus such as this would allow a user to 

where the desired data is held. obtain data that would otherwise have to be obtained by 

Id a general sense, as opposed to the personalized system further browser navigation, 
described above, the current technology of searching URLs 

over a DPN such as the Internet with a search engine SUMMARY OF THE INVENTION 

involves entering a query into a search dialog box and fa a preferred embodiment of the present invention a 

submitting the query to a server hosted by the provider of the 15 method for extending an on-line Internet search beyond 

search function. The requested data is compared to data held pre-referenced sources is provided, comprising steps of (a) 

in a database containing cached URLs, which may contain entering a first search criteria in a first search function; (b) 

data matching a query. Matching URLs (URLs with data initiating the first search function; (c) returning in the first 

content matching a query) are then returned to a user's search function a pre-referenced first document having data 

browser window for browsing and selection as is generally 20 associated with the first search criteria; (d) testing the first 

known in the art. document for an embedded second search function; (e) on 

There are many different methods and criteria used by finding a second search function in the first document, 

search engines for searching out data on the Internet. Most automatically entering at least a form of the first search 

typical is the use of key words or phrases that are used to find criteria in the second search function; and (f) returning 

matches in text contained on an indexed WEB page (URL). 25 addresses in the first search function for documents found 

Other methods include searching by site, searching for through the second search function, 

video, searching for photographs, searching for audio, and \ Q one embodiment the first search function allows natural 

so on ' language in entering search criteria, and further comprises a 

The above technology is limited in a general sense as 3Q parsing step for parsing criteria input for significant words 

described above by a fact that all URL pages containing and phrases for criteria matches. The first function, in some 

information which may be desired by a user, are not listed embodiments in step (e), tests the second search function for 

in conventional search engine databases. In fact, there are a criteria rules, and amends the first search criteria to conform 

vast number of URL resources that are maintained on the to the criteria rules. 

Internet that are not listed in any search engine database and 3S i n some embodiments the first search function is provided 

therefore may not be found through a query-type search by a subscription portal service, an is operated by proxy by 

method. subscribers. In some of these embodiments the first search 

For example, a main page having a URL and hosted by an function is limited in step (c) to returning first documents 

enterprise may contain several links to pages that contain pre-registered to a specific subscriber invoking the first 

additional information. ^ search function. 

However, only the main page of the site is typically In another aspect of the invention an Internet search 

indexed on any given search-engine list unless a host of the application is provided, comprising a first search module 

site or other entity submits the additional URLs to be having a first criteria interface for entry of a first search 

included in a database held by the enterprise hosting the criteria; an inspection function for identifying a second 

search engine. Therefore, in order to obtain the additional 45 search module in a returned electronic document, the second 

data from un-indexed sites, a user must navigate to addi- search module having a second criteria interface; and an 

tional sites from a "jump-off page" found during the original entry module for entering the first search criteria into the 

search. search criteria interface of the second search module. The 

Many enterprises, especially companies hosting many new search function is characterized in that the search 

pages, provide a convenient search engine function embed- 50 application, upon entry of a first search criteria in the first 

ded into a page at a main URL, which is indexed in the criteria interface, returns at least one electronic document 

conventional search-engine databases. In this way, a user having a match to the first search criteria, inspects the 

may search for the main site, invoke the returned link, and document for the second search module, and transfers at 

then use the provided private search engine to explore the least a form of the first search criteria into the second criteria 

additional pages or look for additional data related to the site 55 interface. 

as a whole. Such a WEB site may be a company or enterprise In preferred embodiments the Internet search application 

site comprising many related WEB pages. further initiates the second search module after transfer of 

When a user invokes a private search function on a main search criteria, and returns at least addresses of documents 

page, he must enter a new query into the private search found by the second search function in the first search 

engine to look for the additional data. He or she is no longer 60 function. 

using the original search engine to look for the data. In some cases the first search module allows natural 

Moreover, the private search engine provided at a site's main language criteria entry, and parses entries for significant 

URL may function by different rules then the original search words and phrases for matching to content in electronic 

engine requiring a user, in many cases, to restructure the documents returned. In some cases as well, the first search 

original query. 65 module, in step (e), tests the second search module for 

In the personalized system described further above criteria rules, and amends the first search criteria to conform 

wherein pre-knowledge exists about the user and WEB page to the criteria rules. 
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In some cases the first search module is provided by a operates as an ISP in addition to a unique network portal, but 

subscription portal service, an is operated by proxy by may, in other embodiments be implemented as a stand-alone 

subscribers, and in some of these embodiments the first Internet server. In yet other embodiments the service and 

search module is limited to returning first documents pre- apparatus described herein may also be provided by such as 

registered to a specific subscriber invoking the first search 5 a search and listing service (AltaVista™, Yahoo™) or by 

function. any other enterprise .hosting a WEB-connected server. 

In embodiments of the present invention taught in Internet 13 is representative of a preferred use of the 

enabling detail below, for the first time, a search function is present invention, but should not be considered limiting, as 

provided for Internet browsing that is capable of invoking the invention could apply in other networks and combina- 

secondary and private search functions in documents 10 tions of networks. 

returned by a first search operation, and finding documents ISP 15 in this embodiment comprises a server 31, a 

at further depth through the invoked functions. modem bank 33, represented here by a single modem, and 

BRIFF DFSrRIPTIOM OF THF DRAWING a maSS re P ositorv 29 for di S ital data " ^ 

DESCRIFTIO N OF THE DRAWING modem baQk is a convenience) ^ connection to the server 

FIGURES 15 CQuld be by type of ne^o^ \^ isp 15, as is 

FIG. 1 is an overview of an Internet portal system and typical in the art, provides Internet access services for 

network according to an embodiment of the present inven- individual subscribers. In addition to well-known Internet 

lion, access services, ISP 15 also provides a unique subscription 

FIG. 2 is an exemplary plan view of a personalized Portal 20 35 an In ! erDCt P. ortal for the of storin S man > r 

home page application as it may be seen on a display WEB pages or destinations along with any passwords and or 

monitor according to an embodiment of the present inven- P^sonal codes associated with those pages, in a manner 

^ on described in more detail below. This umque portal service is 

.„ . . . . . provided by execution of Portal Software 35, which is 

FIG. 3 is a flow diagram illustrating user interaction with termed b the ^ Fasswoi6 . A1L ^ ^ ^ 

the Internet portal of FIG. 1. 25 ware of the invention ^ referred to herein ^ ^ the Poml 

FIG. 4 is a block diagram illustrating a summarization Software, and as the Password-all software suite. Also, in 

software agent and capabilities thereof according to an muc h of the description below, the apparatus of the inven- 

embodiment of the present invention. tion is referred to by the Password-All terminology, such as 

FIG. 5 is a logical flow chart illustrating an exemplary the Password-All Server or Password-All Portal, 

summarization process performed by the software agent of 3 ISP 15 is connected to Internet 13 as shown. Other 

FIG. 4 operating in a user-defined mode. equipment known in the art to be present and connected to 

FIG. 6 is a logical flow chart illustrating an exemplary a network such as Internet 13, for example, IP data routers, 

summarization process performed by the software agent of data switches, gateway routers, and the like, are not illus- 

FIG. 4 in a User-independent smart mode with minimum 35 trated here but may be assumed to be present. Access to ISP 

user input. 15 is through a connection-oriented telephone system as is 

FIG. 7 is an architectural overview of a system navigated known in the art, or through any other Intemet/WEB access 

to search for data on a DPN network according to prior art. connection, such as through a cable modem, special network 

FIG. 8 is an architectural overview of a system employing * onnecti ° n T1 )> ™**> ^ Such connection is 

a personalized search method for data on a DPN network 40 SS^J^™ 33* ^ 

according to an embodiment of the present invention. ^ 

itt^ o to 1 a' — mi ^ <v. 1° a preferred embodiment a user has access to Internet 

FIG. 9 is a block diagram illustrating software compo- n Li *u n ~» i , , , 

. c t r . r j-* Password- All Portal services by a user name and password 

nents ot a search function interlace according to an embodi- . . iL _ .. \. 

. c 4 . * - ^ is well known in the art, which provides an individualized 

ment of the present invention. ♦ *u w -u \ u j* 

r 45 WEB page to the subscriber. In another embodiment 

FIG. 10 is a process flow diagram illustrating basic wherein a user has other individuals that use his or her 

interaction steps for practicing the present invention accord- internet account, then an additional password or code unique 

ing to a preferred embodiment. to thc ^ may ^ required bcforc aocess t0 portal 31 ^ 

FIG. 11 is a block diagram illustrating the standard granted. Such personalized Portal WEB pages may be stored 

data-search system of FIG. 7 enhanced with the method and 5Q in repository 29, which may be any convenient form of mass 

apparatus of the present invention according to an alternate storage. 

embodiment of the present invention. Three Internet servers 23, 25, and 27, are shown in 

DESCRIPTION OF THE PREFERRED Mc ™\ 13 ' an ? rc P rese ^ j^rnet servers hosted by various 

EMBODIMENTS enterprises and subscribed to by a user operating appliance 

55 17. For example, server 23 may be a bank server wherein 

According to a preferred embodiment of the present interactive on-line banking and account managing may be 

invention, a unique Internet portal is provided and adapted performed. Server 25 may be an investment server wherein 

to provide unique services to users who have obtained investment accounts may be created and managed. Server 27 

access via an Internet or other network connection from an may be an airline or travel server wherein flights may be 

Internet-capable appliance. Such an interface provides users go booked, rickets may be purchased, and so on. In this 

with a method for storing many personal WEB pages and example, all three servers are secure servers requiring user 

further provides search function and certain task-performing ID and password for access, but the invention is not neces- 

functions. The methods and apparatus of the present inven- sarily limited to just secure services, 

tion are taught in enabling detail below. \ Q a preferred embodiment of the present invention, a 

FIG. 1 is an overview of an Internet portal system 11 and 65 subscribing user operating an Internet-capable appliance, 

Internet network 13 according to an embodiment of the such as appliance 17, connects to Password-All Portal 

present invention. Portal system 11, in this embodiment, system 11 hosted by ISP 15, and thereby gains access to a 
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personalized, interactive WEB page, which in turn provides however, that the password and user name be displayed for 

access to any one of a number of servers on Internet 13 such a user or users. These may well be stored transparently in a 

as servers 23, 25, and 27, without being required to enter user's profile, and invoked as needed as a user makes 

additional passwords or codes. In a preferred embodiment selections. Therefore, a user is spared the need of entering 

the software that enables this service is termed Password-All 5 passwords and user names for any destinations enabled by 

by the inventors. Password-All may be considered to be a Usl 34 0f coursc> each ^ 34 ^ built? configured and 

software suite executing on the unique server and in some mamta ined by a subscribing user or users, and an editing 

instances also on the user s station (chent) Additional faciH ty is also provided wherein a user may edit and update 

mteractmty provided by portal software 35 allows a con- Usti mdudin chan ^ URI/s ^ &nd ^ 

nected user to search his listed pages for information asso- 1Q Kstin * and the fice 
ciated with keywords, text strings, or the like, and allows a 

user to program userndefined tasks involving access and In anolhcr ^ cci of thc invention new listings for a user's 

interacuon with one or more Internet-connected servers such profile, such as a new passthrough to a bank or other 

as servers 23, 25, and 27 according to a pre-defined time enterprise page, may be added semi-automatically as fol- 

schedule. These functions are taught in enabling detail 1C lows: Typically, when a user opens a new account with an 

k e j ow 15 enterprise through interaction with a WEB page hosted by 

FIG. 2 is an illustration of a personalized portal page as ^ enterprise the user is required to provide certain 

may be seen on a display monitor according to an embodi- ™ form f 0D ' whlch I wl11 typically mclude such as the user's 

„ ' r 4U * • »• j a u n j a 11 ID, address, e-mail account, and so forth, and typically a 

ment of the present invention, provided by Password-All , ' it _ ' JY , * . 

Portal software 35 executing on server 31, in response to 20 neW USer , mme and JP^f t0 300655 * e a , CC0UDt - ,nth | s 
secure access by a subscriber. Page 32 presents aninterac- 2 ° proce f '^.^ ^ be m f D achn S ^„ me en . teI ? rlse s 
live listing 34 of user-subscribed or member WEB pages, p Ta u h l s/h " t brow f !r : Pfssword-AU plug-m is pro- 
identified in this example by URL, but which may also be X ,ded whereln ' ? fte ' en,e ™g the information for 

identified by any convenient pseudonym, preferably ^ ^-Wv ^WiT^T^ 

* . i j . • ii j signal (right click, key stroke., etc.), and the Password-All 

descriptive, along with user name and typically encrypted * \,7* ' J ' ^ , : , " 

password information for each page. Listed in a first column * smt * ™" cnt f » ™ m the user s Pass- 
under destination, are exemplary destinations LBC.com, My WOrd - P rofile at the P^rd-All Portal server. 
Bank.com, My Stocks.com, My shopping.com, In a related method for new entnes, the enterprise hosting 
Mortgage.com, and Airline.com. These are but a few of mc Password-All Portal may, by agreement with other 
many exemplary destinations that may be present and listed 30 enterprises, provide log-in and sign-up services at the 
as such on page 33. In order to view additional listings listed Password- All Portal, with most action transparent to the 
but not immediately viewable from within application 33, a ™* T - For example, there may be, at the Password-All Portal, 
scroll bar 35 is provided and adapted to allow a user to scroll a selectable browser list of cooperating enterprises, such as 
up or down the list to enable viewing as is known in the art. banks, security services, and the like, and a user having a 
Items listed in list 34 in this example may be considered 35 P^ord-All Portal surjscription and profile may select 
destinations on such as servers 23, 25, and 27 of FIG. 1. amoD S ^ C00 P eratiD g enterprises and open new accounts, 
Typically the URL associated with an item on this list will which wiU simultaneously and automatically be added to the 
not take a user to a server, per se, but to a page stored on a Password-All Portal page for the user and to the server 
server. User names and password data associated with each hosted b ? the cooperating enterprise. There may be some 
item in list 34 are illustrated in respective columns labeled 40 ^racfcvity required for different accounts, but in the main, 
user name, and password, to the right of the column labeled much 1 "Ration .from the user's profile may be used 
destination. Each listing, or at least a portion of each listing, directl y wlthout bem S r e" eatered * 

is a hyperlink invoking, when selected, the URL to that ^ G inventors have anticipated that many potential users 

destination. In some instances a particular service may have may De suspicious of providing passwords and user 

more than one associated URL. For example, My Bank.com 45 names to an enterprise hosting a Password-All Portal Server 

may have more than one URL associated for such as executing a service like Password-All according to embodi- 

different accounts or businesses associated also with a single ments of the present invention. To accommodate this 

subscriber. In this case there may be a sub-listing for problem, in preferred embodiments, it is not necessary that 

different destinations associated with a single higher-level me user provide the cleartext password to Password. All. 

listing. Ibis expedient is not shown, but given this teaching 50 Instead, an encrypted version of each password is provided, 

the mechanism will be apparent to those with skill in the art. When a user links to his passthrough page in Password-All 

In some embodiments one page 33 may be shared by more at me Password-All Portal server, when he/she invokes a 

than one user, such as a husband and wife sharing a common hyperlink, the encrypted password is returned to the user's 

account and subscription. An instance of this is illustrated system, which then, by virtue of the kept encryption key or 

herein with respect to the server labeled Mortgage.com 55 master Password, invokes the true and necessary password 

wherein both a John and a Jane Doe are listed together under for connection to the selected destination. It is thus not 

the column labeled user name. In another embodiment, a necessary that cleartext passwords be stored at the 

network of individuals, perhaps business owners, authorized Password-All Portal server, where they may be vulnerable to 

co-workers, investment parties, or the like may share one attack from outside sources, or to perceived misuse in other 

application. In this way, system 11 may be adapted for 60 wa ^ s we ^' 

private individuals as well as business uses. In a related safety measure, in a preferred embodiment of 

After gaining access to application 33 which is served via the invention, a user's complete profile is never stored on a 

Internet portal server 31 of FIG. 1, a user may scroll, single server, but is distributed over two or more, preferably 

highlight, and select any URL in his or her list 34 for the more, servers, so any problem with any one server will 

purpose of navigation to that particular destination for 65 minimize the overall effect for any particular user, 

further interaction. Application 33 already has each pass- Password-All, as described above, allows a user to access 

word and user name listed for each URL. It is not necessary, a complete list of the user's usual cyberspace destinations, 
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complete with necessary log-on data, stored in an encrypted 
fashion, so a user may simply select a destination (a 
hyperlink) in the Password-All list, and the user's browser 
then invokes the URL for the selected destination. In an 
added feature, Password-All may display banner ads and 
other types of advertisement during the navigation time 
between a hyperlink being invoked and the time the desti- 
nation WEB page is displayed. 

In yet another embodiment of the invention, a user/ 
subscriber need not access the Password- All page to enjoy 
the advantages of the unique features provided. In this 
variation, a Plug-In is provided for the subscriber's WEB 
browser. If the subscriber navigates by use of the local 
browser to a WEB page requiring a secure log-in, such as 
his/her on-line banking destination, when the subscriber is 
presented with an input window for ID and Password, the 
plug in may be activated by a predetermined user input, such 
as a hot key or right click of the mouse device. The plug-in 
then accesses, transparently, the Password-All page (which 
may be cached at the client), and automatically accesses and 
provides the needed data for log-on. 

In yet another aspect of the invention a search option 37 
allows a user to search list 34 for specific URL's based on 
typed input such as keywords or the like. In some cases, the 
number of URL's stored in list 34 can be extensive making 
a search function such as function 37 an attractive option. A 
criteria dialog box 51 illustrated as logically separated from 
and below list 34 is provided and adapted to accept input for 
search option 37 as is known in the art. In one embodiment, 
search option 37 may bring up a second window wherein a 
dialog box such as box 51 could be located. 

In another aspect of the invention the search function may 
also be configured in a window invoked from window 33, 
and caused to search all or selected ones of listed 
destinations, and to return results in a manner that may be, 
at least to some extent, configured by a user. For example, 
a dialog box may be presented wherein a user may enter a 
search criteria, and select among all of the listed destina- 
tions. The search will then be access each of the selected 
destinations in turn, and the result may be presented to the 
user as each instance of the criteria is found, or results may 
be listed in a manner to be accessed after the search. 

Preferably the search function is a part of the Password- 
All Portal software, available for all users, and may be 
accessed by hyperlinks in user's personal pages. In some 
embodiments users may create highly individualized search 
functions that may be stored in a manner to be usable only 
by the user who creates such a function. 

In many aspects of the present invention, knowledge of 
specific WEB pages, and certain types of WEB pages, is 
highly desirable. In many embodiments characteristics of 
destination WEB pages are researched by persons 
(facilitators) maintaining and enhancing Password-All Por- 
tal software 35, and many characteristics may be provided in 
configuration modules for users to accomplish specific tasks. 
In most cases these characteristics are invoked and incor- 
porated transparent to the user. 

In yet another aspect of the present invention, the 
Password-All suite is structured to provide periodic reports 
to a user, in a manner to be structured and timed by the user, 
through the user's profile. For example, reports of changes 
in account balances in bank accounts, stock purchases, stock 
values, total airline travel purchases, frequent-flier miles, 
and the like may be summarized and provided to the users 
in many different ways. Because the Password-All Portal 
server with the Password-All software site handles a broad 
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variety of transactional traffic for a user, there is an oppor- 
tunity to summarize and collect and process statistics in 
many useful ways. In preferred embodiments of the inven- 
tion such reports may be furnished and implemented in a 
5 number of different ways, including being displayed on the 
user's secure personal WEB page on the Password-All 
Portal, 

In addition to the ability of performing tasks as described 
above, task results including reports, and hard documents 

10 such as airline tickets may be sent over the Internet or other 
data packet-networks to user-defined destinations such as 
fax machines, connected computer nodes, e-mail servers, 
and other Internet-connected appliances. All tasks may be 
set-up and caused to run according to user-defined schedules 
while the user is doing something else or is otherwise riot 

15 engaged with the scheduled task. 

In another embodiment of the present invention, recog- 
nizing the increasing use of the Internet for fiscal 
transactions, such as purchasing goods and services, a 
facility is provided in a user's profile to automatically track 

20 transactions made at various destinations, and to authorize 
payment either on a transaction-by-transaction basis, or after 
a session, using access to the user's bank accounts, all of 
which may be pre-programmed and authorized by the user. 
Other functions or options illustrated as part of applica- 

25 tion 35 include a last URL option 41, an update function 43, 
and an add function 45. Function 41 allows a user to 
immediately navigate to a last visited URL. Update function 
43 provides a means of updating URL's for content and new 
address. An add function enables a user to add additional 

30 URL's to list 34. Similarly, function 45 may also provide a 
means to delete entries. Other ways to add accounts are 
described above. It should be noted that the services pro- 
vided by the unique Password-All Portal in embodiments of 
the present invention, and by the Password-All software 

35 suite are not limited to destinations requiring passwords and 
user names. The Password-All Portal and software in many 
embodiments may also be used to manage all of a user's 
bookmarks, including editing of bookmarks and the like. In 
this aspect, bookmarks will typically be presented in 

4Q indexed, grouped, and hierarchical ways. 

There are editing features provided with Password-All for 
adding, acquiring, deleting, and otherwise managing book- 
marks. As a convenience, in many embodiments of the 
invention, bookmarks may be downloaded from a user's 

45 Password- All site, and loaded onto the same user's local 
browser. In this manner, additions and improvements in the 
bookmark set for a user may be used without the necessity 
of going to Password-All. Further, bookmarks may be 
uploaded from a user's local PC to his/her home page on the 

50 Password- All site by use of one or more Password -All 
plug-ins. 

It will be apparent to the skilled artisan, given the teaching 
herein, that the functionality provided in various embodi- 
ments of the invention is especially applicable to Intemet- 

55 capable appliances that may be limited in input capability. 
For example, a set-top box in a WEB TV application may 
well be without a keyboard for entering IDs and Passwords 
and the like. In practice of the present invention keyboard 
entry is minimized or eliminated. The same comments apply 

60 to many other sorts of Internet appliances. 

In preferred embodiments of the invention, once a 
subscriber-user is in Password-All, only an ability to point- 
and-click is needed for all navigation. To get into the 
Password-All site, using a limited apparatus, such as an 

65 appliance without a keyboard or keypad, a Smartcard or 
embedded password may be used, or some other type of 
authentication. 
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It will be apparent to one with skill in the art that an 
interactive application such as application 33 may be pro- 
vided in a form other than a WEB page without departing 
from the spirit and scope of the present invention. For 
example, an application such as application 33 may be 5 
provided as a downloadable module or program that may be 
set-up and configured off-line and made operational when 
on-line. 

FIG. 3 is a flow diagram illustrating user interaction with 
the Internet Password-All Portal of FIG. 1. The following 10 
process steps illustrated, according to an embodiment of the 
present invention, are intended to illustrate exemplary user- 
steps and automated software processes that may be initiated 
and invoked during interaction with an Internet portal of the 
present invention such as portal 31 of FIG. 1. In step 53 a 15 
user connects to the Internet or another previously described 
switched-packet network via a compatible appliance such as 
Internet appliance 17 of FIG. 1. 

At step 55, a user enters a user-name and password, 
which, in one embodiment, may simply be his ISP user name 20 
and password. In another embodiment, a second password or 
code would be required to access an Internet portal such as 
portal server 31 of FIG. 1 after logging onto the Internet 
through the ISP. In some cases, having a special arrangement 
with the ISP, there may be one password For both Internet 25 
access through the ISP and for Password-All. At step 57 a 
personal WEB page such as page 32 of FIG. 2 is displayed 
via Internet portal server 31. At minimum, the personalized 
WEB page will contain all user configured URL's, and may 
also be enhanced by a search function, among other possi- 30 
bilities. 

In step 58 a user will, minimally select a URL from his or 
her bookmarked destinations, and as is known by hyperlink 
technology, the transparent URL will be invoked, and the 35 
user will navigate to that destination for the purpose of 
normal user interaction. In this action, the Password-All 
Portal software transparently logs the user on to the desti- 
nation page, if such log-on is needed. 

At step 60 the user invokes a search engine by clicking on 
an option such as described option 37 of FIG. 2. At step 62, 
the user inputs search parameters into a provided text field 
such as text field 51 of FIG. 2. After inputting such 
parameters, the user starts the search by a button such as 
button 52. The search engine extracts information in step 64. 45 
Such information may be, in one option, of the form of 
URL's fitting the description provided by search parameters. 
A searched list of URL's may be presented in a separate 
generated page in step 66 after which a user may select 
which URL to navigate to. In an optional search function, 50 
the user may provide search criteria, and search any or all of 
the possible destinations for the criteria. 

In another embodiment wherein WEB pages are cached in 
their presentable form, information extracted in step 64 may 
include any information contained in any of the stored pages ss 
such as text, pictures, interactive content, or the like. In this 
case, one displayed result page may provide generated links 
to search results that include the URL associated with the 
results. Perhaps by clicking on a text or graphic result, the 
associated WEB page will be displayed for the user with the 60 
result highlighted and in view with regards to the display 
window. 

Enhanced Agent for WEB Summaries 

In another aspect of the present invention, a software 65 
agent, termed a gatherer by the inventors, is adapted to 
gather and return summary information about URL's 
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according to user request or enterprise discretion. This is 
accomplished in embodiments of the present invention by a 
unique scripting and language parsing method provided by 
the inventor wherein human knowledge workers associated 
with the service provide written scripts to such a gatherer 
according to subscriber or enterprise directives. Such a 
software gatherer, and capabilities thereof, is described in 
enabling detail below. 

Referring now to FIG. 1, there is illustrated an exemplary 
architecture representing a portal service-network which, in 
this case is hosted by ISP 15. Portal software 35 in this 
embodiment executes on portal server 31 set-up at the ISP 
location. Mass repository 29 is used for storing subscriber 
information such as passwords, log-in names, and the like. 
Internet servers 23, 25, and 27 represent servers that are 
adapted to serve WEB pages of enterprises patronized by a 
subscriber to the portal service such as one operating Inter- 
net appliance 17. 

The main purpose of portal software 35 as described 
above with reference to FIG. 2, is to provide an interactive 
application that lists all of the subscriber's WEB sites in the 
form of hyperlinks. When a user invokes a hyperlink from 
his personal list, software 35 uses the subscriber's personal 
information to provide an automatic and transparent log-in 
function for the subscriber while jumping the subscriber to 
the subject destination. 

Referring again to FIG. 2, an interactive list 34 containing 
user-entered hyperlinks and a set of interactive tools is 
displayed to a subscriber by portal software 35 of FIG. 1. 
One of the tools available to a subscriber interacting with list 
34 is agent (software) 39. Agent 39 may be programmed to 
perform certain tasks such as obtaining account information, 
executing simple transactions, returning user-requested noti- 
fication information about upcoming events, and so on. 
Search function 37 and update function 43 may be integrated 
with agent 39 as required to aid in functionality. 

It is described in the above disclosure that agent 39 may, 
in some embodiments, search for and return certain sum- 
mary information contained on user-subscribed WEB pages, 
such as account summaries, order tracking information and 
certain other information according to user-defined param- 
eters. This feature may be programmed by a user to work on 
a periodic time schedule, or on demand. 

In the following disclosure, enhancements are provided to 
agent 39. Such enhancements, described in detail below, 
may be integrated into agent 39 of portal software 35 (FIGS. 
1 and 2 ); and may be provided as a separate agent or 
gatherer to run with portal software 35; or may, in some 
embodiments, be provided as a standalone service that is 
separate from portal software 35. 

FIG. 4 is a block diagram illustrating a summarization 
software agent 67 and various capabilities and layers thereof 
according to an embodiment of the present invention. Sum- 
marization agent 67, hereinafter termed gatherer 67, is a 
programmable and interactive software application adapted 
to run on a network server. Gatherer 67 may, in one 
embodiment, be integrated with portal software 35 of FIG. 
1 and be provided in the form of a software module separate 
from agent 39 (FIG. 2). In another embodiment, gatherer 67 
may be a part of agent 39 as an enhancement to the function 
of that agent as previously described. In still another 
embodiment, gatherer 67 may be provided as a parent or 
client-side application controlled by a separate service from 
the portal service described above. 

In this exemplary embodiment gatherer 67 is a multi- 
featured software application having a variety of sub- 
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modules and interface modules incorporated therein to pro- navigational instruction and password or log-in information 

vide enhanced function. Gatherer 67 has a client/service may be contained therein and executed serially, such as one 

interface layer 69 adapted to enable directive input from site at a time. It is important to note that the knowledge 

both a client (user) and a knowledge worker or workers worker or workers may perform much of their scripting via 

associated with the service. A browser interface 77 is pro- 5 automatic controls such as by object linking and embedding 

vided in layer 69, and adapted to provide access to appli- (0LE) and a minor ^ of scripting may ^ performed 

cation 67 from a browser running on a client's PC or other manuaU y in an appropriate computer language, many of 

Internet or network appliance Interface 77 facilitates whfch m lmawn {n me arQ 

bi-directional communication with a user s browser apph- ' 

cation (not shown) for the purpose of allowing the user to in Gatherer 67 also has a process layer 71 adapted for 

input summary requests into gatherer 67 and receive sum- internal information gathering and parameter configuration, 

mary results. Interface 77 supports all existing network An optional portal server interface 81 is provided and 

communication protocols such as may be known in the art, adapted to allow gather 67 to provide updated information to 

and may be adapted to support future protocols. a user's list of hyperlinks and also to obtain data from portal 

Layer 69 also comprises a unique input scripting module server 31 if required. For example, required hyperlinks may 

79 that is adapted to allow a human knowledge worker to 15 be mirrored from a user's home page to a scripting template 

create and supply directive scripts containing the site logic for navigational purposes. In an embodiment wherein gath- 

needed by gatherer 67 to find and retrieve data from a WEB erer 67 is part of a standalone service, a convention for 

site. In this case, gatherer 67 executes and runs on a network providing user log-in information may be supplied at the 

server such as server 31 of FIG. 1. However, this is not client's end when a request is made. For example, an 

required in order to practice the present invention. 20 encrypted password may be supplied by a client plug-in and 

It is assumed in this example that gatherer 67 is part of the gatherer 67 may temporarily borrow the user's encryption 

portal software suite 35 running on server 31 of FIG. 1. key when auto log-in is performed. 

Gatherer 67 may be provided as several dedicated agents, or ^ appliance configuration module 83 is provided and 

as one multi-fonctional agent without departing from the ad ^ to aUow a ^ tQ ^ afld ^ £ aQ 

spirit and scope of the present invention. For example, one 25 r „ • „ -. u # . ! P . 

*u „ a _f u • * j j j * \ appliance to communicate with the service and receive 

gatherer 67 may be scripted and programmed to execute a rr . c „ , . * , ,_ 

single user request with additional gatherers 67 called upon summ . ar T l ° foimatlon - Su <* W^ces may mc ude but are 

to perform additional user-requests. Alternatively, one gath- ™* 1 1 mntcd t0 P alm t0 P PC s - la P t0 P rc s > cellular 

erer 67 may be dedicated and assigned to each individual telephones, WEB TV's, and so on. Typically, a user will be 

user and adapted to handle all requests from that user. 30 presented a configuration WEB page from a network server 

Interface layer 69 facilitates exchange of information **t ™ his ™ d ™ on his d ^ kto P PC - **** . 

from both a client and a knowledge worker. Aclient oper- P a S e contains an interface for communicating device param- 

ating a WEB browser with an appropriate plug-in is enabled e ' eis and communication protocol types to module 83. In 

to communicate and interact with gatherer 67. For example, ^ wav ' a **y configure a preferred device for receipt 

a user may enter a request to return a summary of pricing for 35 of summar y formation. Device parameters and communi- 

all apartments renting for under $1000.00 per month located cation protocols inherent to such a device are incorporated 

in a given area (defined by the user) from apartments.com mto ^ scripting of the site template and are used as 

(one of user's registered WEB sites). The just mentioned instructions for WEB summary delivery, 

request would be categorized as either a periodic request, or A navigation layer 73 is provided and adapted to perform 

a one time (on demand) request. The communicated request w the function of external site navigation and data gathering 

initiates a service action wherein a knowledge worker asso- f° r gatherer 67. To this end, a communication interface/ 

ciated with the service uses module 79 to set-up gatherer 67 browser control module 85 is provided and adapted to 

to perform it's function. Module 79 is typically executed function as a WEB browser to access WEB sites containing 

from a network-connected PC operated by the knowledge WEB data. Control 85 receives it's instruction from the 

worker. 45 scripted template created by the knowledge worker. 

According to an embodiment of the present invention, a A parsing engine 87 is provided and adapted to parse 

unique scripting method facilitated by module 79 is pro- individual WEB sites according to a template created via 

vided to enable gatherer 67 to obtain the goal information scripting module 79. Parsing engine 87 may be a Pearl 

requested by a user. For example, the above mentioned engine, an IE HTML engine, or any other or combination of 

example of WEB-site apartments.com has a specific HTML 50 known parsing engines. The template (not shown) tells 

(hyper-text-markup-language) logic that it uses to create its control 85 and parsing engine 87 where to go and what fields 

site and post its information. Such site logic is relatively at the destination site to look for to access desired data. Once 

standard fare for a majority of different sites hosted by the data fields are located, parsing engine 87 gathers current 

different entities. Using this knowledge, a knowledge data in the appropriate field, and returns that data to the 

worker creates a site-specific script or template for gatherer 55 service for further processing such as data conversion, 

67 to follow. Such a template contains descriptions and compression and storage, and the like, 

locations of the appropriate fields used, for example, at Because WEB sites use tools that use consistent logic in 

apartments.com. Apartment description, location, deposit setting up their sites, this logic may be used by the summa- 

information, rental information, agent contact information, rization service to instruct control 83 and parsing engine 87. 

and other related fields are matched in terms of location and 60 Hie inventor provides herein an exemplary script logic for 

label description on the template created with module 79. navigating to and garnishing data from amazon™.com. The 

Completed templates are stored in a database contained in a hyperlinks and/or actual URL's required for navigation are 

storage facility such as, perhaps, repository 29 of FIG. 1. not shown, but may be assumed to be included in the 

Such templates may be reused and may be updated (edited) template script. In this example, a company name Yodlee 

with new data. 65 (known to the inventors) is used in the script for naming 

In one embodiment, one script may contain site logics for object holders and object containers, which are in this case 

a plurality of WEB pages and instructions for specific Active X™ conventions. In another embodiment, Java™ 
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script or another object linking control may be used. The 
scripted template logic example is as follows: 



# Site amazon.orders.x - shows status of orders from Amazon 
login ( 7 ); 

get( M /exec/6bdios/order-list/" ); 

my ©tables - get tables_containing_text( "Orders:'* ); 

my Sorder_Jist = new Yodlec:;ObjectHolder( 'orders* ); 

$oider_list->source( 'amazon' ); 

$oider_Iist->link_info( get_link_info() ); 

my @hrcf_list; 

my @container_list; 

foreach my Stable ( @tables ) { 

my @rows o get_table_rowsO; 
foreach my $i (0 . . $#iows ) { 

select_row ( $i ); 

my Stext - get_text( $rows[ $i ] ); 

next if $text — /Orders :|Statns/; 

my @items - gct_row_itemsO; 

next unless @ items >« 4; 

my( $order__num, Sdate, Sstatus ); 

selecL_cell( 1 ); 

$order_num = get_cell_text(); 

my Snref = get_url_of_Jirst_Jiref( get_cell() ); 

select_cell( 2 ); 

Sdate - get_cell_tcxtO; 

select__cell( 3 ); 

Sstatus = get__cell_text(); 

next unless defined $order_num and defined $date and 
defined 

Sstatus; 

Sorder - new Yodlee::Container( 'orders* ); 

$order->ordcT_number( $ordcr num ); 

$order->date( $date ); 
$order->status( Sstatus ); 
$order_list->push_object( Sorder ); 
if( defined Shrcf ) { 

push( @href_list, $href ); 

push( ©container list, Sorder ); 

foreach my $i ( 0 . . $#href_list ) { 
get( $href_list[ $i ] ); 

(©tables = get tables containing text( "Items 

Ordered:" ); 
foreach my Stable ( @tables ) { 

my @rows = get_tablc rowsO; 

foreach my $j ( 0 . . $#rows ) { 

select_row( $j ); 

my Shrcf get„url_of__first_Jircf( get rowQ ); 

next unless denned $href; 

my @child_Jist - get_children( get_rowQ, 'a' ); 

next unless defined $child list[ 0 ]; 

my Stext - get_text( $child_list[ 0 ] ); 
(container list[ $i }>description( Stext ); 

} 

} 

} 

result^ $order_list ); 



The above example is a script that instructs control 85 and 
parser 87 to navigate to and obtain data from 
Amazon™. com, specifically that data that reflects the user's 
current order status. Scripts may also be written to obtain 
virtually any type of text information available from any 
site. For example, a user may wish to obtain the New York 
Tunes headlines, the top ten performing stocks, a compara- 
tive list of flights from San Francisco to New York, etc. In 
one embodiment, metadata may be associated with and used 
in-place of the actual scripted language for the purpose of 
reducing complication in the case of many scripts on one 
template. 

A data processing layer 75 is provided and adapted to 
store, process, and present returned data to users according 
to enterprise rules and client direction. A database interface 
module 89 is provided and adapted to provide access for 
gatherer 67 to a mass repository such as repository 29 of 
FIG. 1, for the purpose of storing and retrieving summary 
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data, templates, presentation directives, and so on. Gatherer 
agent 67 may also access data through interface 89 such as 
profile information, user account and URL information, 
stored site Logics and so on. Data scanned from the WEB is 

5 stored in a canonical format in a database such as repository 
29, or in another connected storage facility. All stored data 
is, of course, associated with an individual who requested it, 
or for whom the data is made available according to enter- 
prise discretion. 

10 Asummarization page module 91 is provided and adapted 
to organize and serve a WEB summary page to a user. 
Module 91, in some embodiments, may immediately push a 
WEB summary to a user, or module 91 may store such 
summarized pages for a user to access via a pull method, in 

15 which case a notification may be sent to the user alerting him 
of the summary page availability. Summarization module 91 
includes an HTML renderer that is able to format data into 
HTML format for WEB page display. In this way, e-mail 
messages and the like may be presented as HTML text on a 

20 user's summarization page. Moreover, any summary data 
from any site may include an embedded hyperlink to that 
site. In this way, a user looking at an e-mail text in HTML 
may click on it and launch the appropriate e-mail program. 
Other sites will, by default, be linked through the summary 

25 page. 

Many users will access their summary data through a 
WEB page as described above, however, this is not required 
in order to practice the present invention. In some 
embodiments, users will want their summary information 

30 formatted and delivered to one of a variety of Internet- 
capable appliances such as a palm top or, perhaps a cell 
phone. To this end, the renderer is capable of formatting and 
presenting the summary data into a number of formats 
specific to alternative devices. Examples of different known 

35 formats include, but are not limited to XML, plain text, 
VbxML, HDML, audio, video, and so on. 

In a preferred embodiment of the present invention, gather 
67 is flexible in such a way as it may act according to 

^ enterprise rules, client directives, or a combination of the 
two. For example, if a user makes a request for summary 
data about a user/subscribed WEB page to be periodically 
executed and presented in the form of a HTML document, 
then gather 67 would automatically access and analyze the 

4S required internal information and user provided information 
to formulate a directive. Using scripting module 79, a 
knowledge worker provides a template (if one is not already 
created for that site) that contains the "where to go" and 
"what to get" information according to site logic, user input, 

50 and known information. 

Alternatively, if a user requests a summary about data on 
one of his sites such as, perhaps, current interest rates and 
re-finance costs at his mortgage site, the service may at it's 
own discretion provide an additional unsolicited summary 

55 from an alternate mortgage site for comparison. This type of 
summarization would be designed to enhance a user's 
position based on his profile information. In this case, 
updated data about latest interest rates, stock performances, 
car prices, airline ticket discounts, and so on would be stored 

50 by the service for comparative purposes. If a user request for 
a summary can be equaled or bettered in terms of any 
advantage to the user, such summary data may be included. 

In many cases, created templates may be re-used unless a 
WEB site changes it's site logic parameters, in which case, 

65 the new logic must be accessed and any existing templates 
must be updated, or a new template may be created for the 
site. The templates contain site-specific script obtained from 
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the site and stored by the knowledge workers. In one 
embodiment, companies hosting WEB pages automatically 
provide their site logics and any logic updates to the service 
by virtue of an agreement between the service and the WEB 
hosts. 

In an alternative embodiment gatherer 67 may be imple- 
mented as a client application installed on a user's PC. In 
this embodiment, a user would not be required to supply 
log-in or password codes. Summarization scripts may be 
sent to the client software and templates may be automati- 
cally created with the appropriate scripts using log-in and 
password information encrypted and stored locally on the 
user's machine. 

In addition to providing WEB summary information, 
gatherer 67 may also be used to provide such as automatic 
registration to new sites, and for updating old registration 
information to existing sites. For example, if a user whishes 
to subscribe, or register at a new site, only the identification 
of the site is required from the user as long as his pertinate 
information has not changed. If a new password or the like 
is required, gatherer 67 through control module 73 may 
present log-in or password codes from a list of alternative 
codes provided by a user. In another embodiment, a database 
(not shown) containing a wealth of password options may be 
accessed by gatherer 67 for the purpose of trying different 
passwords until one is accepted by the site. Once a password 
or log-in code is accepted, it may be sent to a user and stored 
in his password list and at the network level. 

It will be apparent to one with skill in the art that a 
software application such as gatherer 67 may be imple- 
mented in many separate locations connected in a data 
network. For example, a plurality of gatherer applications 
may be distributed over many separate servers linked to one 
or more mass repositories. Client applications include but 
are not limited to a WEB-browser plug-in for communicat- 
ing to the service. Plug-in extensions may also be afforded 
to proxy servers so that auto-log-in and data access may still 
be performed transparent to a user. 

In another embodiment, plug-ins enabling communica- 
tion with gatherer 67 may be provided and configured to run 
on other network devices for the purpose of enabling such a 
device to initiate a request and get a response without the 
need for a desktop computer. 

In most embodiments a user operating a desktop PC will 
order a one time or periodic summary related to some or all 
of his subscribed WEB sites. A logical flow of an exemplary 
request/response interaction is provided below. 

FIG. 5 is a logical flow chart illustrating an exemplary 
summarization process performed by the software agent of 
FIG. 4 operating in a user-defined mode. In step 93, a user 
has initiated a new request for a summary (summary order). 
It is assumed for the purpose of discussion, that the request 
of step 93 involves a site wherein no template has been 
created. In step 95, the request is received and analyzed. A 
knowledge worker will likely perform this step. The new 
request may be posted to the user's portal borne page, sent 
directly to gatherer 67, or even communicated through 
e-mail or other media to the service. 

In step 97 a knowledge worker accesses particular site 
logic associated with the request URL'S. For example, if the 
request involves a plurality of URL's, then all site logics for 
those URL's are accessed. Logic may be available in a 
repository such as repository 29 of FIG. 1 if they were 
obtained at the time of user registration to a particular URL, 
or sent in by WEB-site hosts shortly after registration. If it 
is a completely new URL, then the logic must be obtained 
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from the site. In most cases however, the logic will be known 
by virtue of a plurality of users accessing common URL's. 
Therefore cross-linking in a database of logic/user associa- 
tions may be performed to access a logic for a site that is new 

5 to one particular user, but not new to another. 

In step 99, the knowledge worker creates a template by 
virtue of scripting module 79 (FIG. 4) containing all site 
logic, URL's, log-in and password information, and the user 
request information. As described previously, templates may 

10 be re-used for a same request. In most cases, scripting may 
be mostly automated with minimum manual input per- 
formed by the knowledge worker. In many cases, an existing 
template will match a new request exactly, and may be 
re-used. In that case steps 97, 99, and 101 would not be 

15 required. 

In step 101 the template is stored and associated with the 
requesting user. The stored template may now be retrieved 
at a scheduled time for performing the summary gathering. 
At step 103, a browser control such as module 85 of FIG. 4 

20 is activated to access the stored template and navigate to 
specified URL's for the purpose of gathering summary data. 
If a timing function is attributed to the template stored in 
step 101, then the template may self execute and call up the 
browser function. In another embodiment, the knowledge 

25 worker may notify the browser control to get the template 
for it's next task. In some embodiments, a plurality of 
controls may be used with one template as previously 
described. 

3Q In step 105, automatic log-in is performed, if required, to 
gain access to each specified URL. In step 107, a specified 
WEB-page is navigated to and parsed for requested data 
according to the logic on the template. If there are a plurality 
of WEB-pages to parse, then this step is repeated for the 

35 number of pages. A variety of parsing engines may be used 
for this process such as an IE™ parser, or a Pearl™ parser. 
Only the requested data is kept in step 107. 

A request may be an on-demand request requiring imme- 
diate return, or a scheduled request wherein data may be 

4Q posted. At step 109, such logic is confirmed If the data is to 
be presented according to a periodic schedule, then sum- 
mary data parsed in step 107 is stored for latter use in step 
111. In step 113, the summary data is rendered as HTML if 
not already formatted, and displayed in the form of a 

4S summary WEB-page in step 115. The summary page may be 
posted for access by a user at a time convenient to the user 
(pull), or may be pushed as a WEB-page to the user and be 
made to automatically display on the user's PC. Notification 
of summary page availability may also be sent to a user to 

50 alert him of completion of order. 

If the summary data is from a one-time on-demand 
request and required immediately by a user, then a network 
appliance and data delivery method (configured by the user) 
is confirmed, and the data is rendered in the appropriate 

55 format for delivery and display in step 117. In step 119, the 
summary data is delivered according to protocol to a user's 
designated appliance. In step 121 a user receives requested 
information in the appropriate format. 

It will be apparent to one with skill in the art that there 

60 may be more or fewer logical steps as well as added 
sub-steps than are illustrated in this example. For example, 
step 105 may in other embodiments include sub-steps such 
as getting an encryption key from a user. In still another 
embodiment, part of a request may be rendered as HTML as 

65 in step 113 while certain other portions of the same request 
data might be rendered in another format and delivered via 
alternative methods. There are many possibilities. 
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The method and apparatus of the present invention may 
be used to present summaries to users without user input. 
Process logic such as this is detailed below. 

FIG. 6 is a logical flow chart illustrating an exemplary 
summarization process performed by the software agent of 5 
FIG. 4 in a User-independent smart mode with minimum or 
no user input. In step 117 an enterprise-initiated summary 
process begins. In this case, the enterprise may be assisting 
a user in finding a better deal or, perhaps presenting the 
individual with summaries from and links to alternative 1Q 
pages not yet subscribed to by a user. 

In step 119, a database containing user information and 
parameters is accessed and reviewed. Certain information 
specific to a user may be required to initiate an enterprise- 
sponsored summary report. At step 121, the knowledge 
worker accesses the site logic specific to the specified target 15 
site or sites for summarization. In step 123, the knowledge 
worker modifies an existing user template, or creates a new 
one if necessary. At step 125 the template is stored in a 
repository such as repository 29 and associated with the user. 

As described in FIG. 5, the template either self-executes 20 
according to a timed function and invokes a browser control 
such as control 85 (FIG. 4), or is accessed by control 85 as 
a result of task notification. In step 127, the browser control 
begins navigation. Auto log-ins are performed, if required, 
in step 129 to gain access to selected sites. If the WEB pages 25 
are new to a user, and the user has no registration with the 
WEB site, then through agreement, or other convention, the 
service may be provided access to such sites. Such an 
agreement may be made, for example, if the host of the WEB 
site realizes a possibility of gaining a new customer if the 30 
customer likes the summary information presented. In many 
other situations, no password or log-in information is 
required to obtain general information that is not personal to 
a client. 

. In step 131, all sites are parsed for summary data and 35 
stored in canonical fashion in step 133. At step 135, the data 
is compiled and rendered as HTML for presentation on a 
summary page. In step 137, a WEB summary containing all 
of the data is made available to a user and the user is notified 
of it's existence. 40 

Providing certain information not requested by a user may 
aid in enhancing a user's organization of is current business 
on the WEB. Moreover, unsolicited WEB summaries may 
provide better opportunities than the current options in the 
user's profile. Of course, assisting a user in this manner will 45 
require that the enterprise (service) have access to the user's 
profile and existing account and service information with 
various WEB sites on the user's list. A user may forbid use 
of a user's personal information, in which case, no 
enterprise -initiated summaries would be performed unless 50 
they are conducted strictly in an offer mode instead of a 
comparative mode. 

The method and apparatus also may be practiced in a 
language and platform independent manner, and be imple- 
mented over a variety of scalable server architectures. 55 

Deeper-Level Searching by Proxy 

As described in the background section, a conventional 
search function cannot search beyond a first level of WEB- 
site depth. A URL must be pre-known either to a user or to 60 
a service providing data-search capability before it may be 
returned as a result of a search. Vast numbers of URLs are 
not indexed into any search engine databases and therefore 
cannot be found by traditional key-word searching. An 
overview of prior art implementation of a traditional data- 65 
search process as practiced on the Internet is provided 
below. 
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FIG. 7 is an architectural overview of a system employing 
a conventional data -search process for on a DPN network 
according to prior art. A communication network 139 is 
exemplified in this prior-art example as a common archi- 
tecture for facilitating network data searches. Network 9 
comprises the well-known Internet network 141, a well- 
known public-switched-telephony network (PSTN) 143, an 
Internet Service Provider (ISP) 145, and an exemplary user 
premise 147. 

It is widely known and accepted in the art that Internet 
141, PSTN 143, ISP 145, and user premise 147 represent a 
communication architecture (network 139 ) commonly used 
by the public for searching out and obtaining network- 
sourced data. 

Internet 141 has an Internet backbone 157 illustrated 
therein and intended to represent the many lines and con- 
nection points making up the Internet network as a whole. 
Two search provider (SP) servers, server 149 and server 151 
are illustrated as connected to backbone 157, and are 
adapted to provide Internet data-search services to the public 
at large as is generally known in the art. A search provider 
is defined, for the purpose of this example, as an enterprise 
engaging in providing WEB -sou reed data made accessible 
to users through server capabilities. Altavista™ and 
Yahoo™ are well-known examples of search providers Such 
enterprises may also provide other services such as portal 
services and so forth. 

SP server 149 has a data store 153 connected thereto by 
a data link. Data store 153 is adapted to contain cached 
URLs, which are compiled and indexed according to enter- 
prise rules and which are accessible through a search-engine 
application illustrated as SW 163 running on server 149. 
Data store 153 may be any kind of suitable data repository 
capable of storing large amounts of data. Data store 153 is 
typically an on-line data repository which is accessed by 
server 149 when matching data-search queries to data con- 
tained in data store 153. 

SP server 151, a connected data store 155, and an instance 
of SW 165 may be described as replicated components of 
server 149, data store 153, and SW 163 accept that a 
differing, enterprises may host such services. For example, 
Altavista™ may host server 149, data store 153, and SW 163 
while Excite™ may host server 151, data store 155 and SW 
165. Slight differences may exist between the separate 
enterprises hosting the aforementioned equipment. 
Therefore, physical differences may exist in the services 
offered as well as in SW and hardware implementations. The 
inventor chooses to focus only on the standard data-search 
functionality common to both equipment and SW groups. 
Therefore, each group is represented with identical capa- 
bilities in this example. 

Two WEB servers (WS), 159 and 161 are illustrated as 
connected to backbone 157 in Internet 141. WSs 1.59 and 
161 are adapted as normal file servers as known in the art. 
Servers 159 and 161 host electronic information pages 
addressed by URLs and are adapted to serve them on 
authorized request from any other network-connected node. 
Electronic WEB-pages are typically formatted in well- 
known Hyper-Text-Mark-up Language (HTML). The URL 
is actually the unique server address of an information page 
as is well known. 

PSTN 143 represents the most common telephony net- 
work used to access Internet 141. PSTN 143 may be 
assumed to contain all of the required equipment for 
enabling telephony communication and connection includ- 
ing such as telephony switches, routers, service control 
points (SCP), network bridging stations, and so on. 
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ISP 145 is provided within PSTN 143 and is adapted to 
perform Internet-access services as known in the art. ISP 145 
comprises a modem bank 171, represented herein by a single 
modem icon, and an Internet connection server 169 adapted 
to connect subscribers to Internet 141. Connection server 
169 is illustrated as having connection to Internet backbone 
157 by an Internet access line 167. Access line 167 may be 
any suitable connection means known in the art for main- 
taining Internet connectivity for a plurality of users access- 
ing Internet 141 through server 169. 

User premise 147 comprises a personal computer (PC) 
175, which is adapted by SW and hardware implementation 
for communication on Internet 141. PC 175 is illustrated as 
connected to modem bank 171 by an Internet access line 
173, which may be any connection means known in the art 
for providing Internet access to user premise 147. Examples 
include normal plain old telephone service (POTS) line, 
Integrated Services Digital Network (ISDN) line, Cable/ 
Modem line and so on. In this example, PC 175 uses a 
dial-up method and ISP 145 to access Internet 141 as is most 
common in the art. 

A browser application 177 is provided and illustrated as 
executing on PC 175 indicating that PC 175 is engaged in a 
browsing session on Internet 141. A search engine, repre- 
sented within browser 177 by the letters SE is incorporated 
by a user operating PC 175 for the purpose of data search as 
is known in the art. A user operating PC 175 and connected 
to Internet 141 through ISP 145, as illustrated by the 
described connections, may invoke an SE through applica- 
tion 177 and thus connect to one of SP servers 149 or 151 
in Internet 141. The exact server connection will depend on 
the proprietary search option listed in application 177 and 
selected by a user. Using the examples presented above, if 
the search option chosen is Altavista™, then PC 175 will be 
connected to SP server 149 hosted by Altavista™. If the 
chosen option is Excite™, then PC 175 would be connected 
to SP server 151 hosted by Excite™. Such methods are 
known in the art and many different search providers hosting 
separate data services may be represented for selection in 
application 177. 

Assuming that a user operating PC 147 is connected to 
Internet 141 through one of several methods provided as 
examples above, a data search may be initiated from appli- 
cation 177 by invocation of search option SE provided as a 
link in application 177. Assuming that upon invocation of 
SE in application 177, a connection to SP server 149 is 
made, then an interactive HTML page representing a data- 
search interface is served to the connected user. SW 163 
running on server 149 then processes any initiated data 
search according to a query entered into a search dialog box 
provided with the HTML interface as is known in the art. 

In process of a query from a user operating PC 175, SP 
server 149 running SW 163 checks data store 153 for any 
URL pages contained therein that have data content associ- 
ated therewith that matches (to some extent) criteria accord- 
ing to the entered query. As described in the background 
section, a query may be a key word, a series of key words, 
a phrase or the like. Server 149 running SW 163 returns any 
matching URL's from data store 153, where they appear in 
listed fashion in application 177. URL results are often 
termed "hits* 1 in the art. There may be only a few or a great 
number of "hits" returned depending on the nature (broad or 
narrow) of the original query entered, and the richness of the 
Internet content. Each hit represents a hyper-link to an 
electronic WEB page that may be hosted, in this example, by 
server 159 or server 161, or any other network-connected 
server. Therefore, invoking a returned URL initiates navi- 
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gation by browser 177 to either server 159 or 161 wherein 
the updated version of the HTML page is served. At this 
point the aforementioned user is negotiating with server 159 
or 161. 

5 There are other possible aspects of connection and com- 
munication represented in this prior-art example as well. For 
example, an enterprise hosting SP server 149 may through 
agreement forward a query to the enterprise hosting SP 
server 151 such that data store 155 may be included in a data 

10 search. With this type of cooperation, many resources may 
be accessed in a shared sense. Therefore, if an original query 
does not return a URL from one data store, an option may 
exist for searching data stores hosted by other enterprises 
without a user having to close one connection and open 

15 another. This process is fairly recent and is termed meta- 
searching in the art. 

A limitation of the prior art exists in that software 
instances 163 and 165 are adapted only to provide URL's 
and data that is indexed in either data store 153 or data store 

20 155. An enterprise hosting server 159 or server 161 may also 
have connected data -stores containing information related to 
electronic pages that are hosted therein. Such data stores 
hold data on a deeper level of WEB-site depth and may be 
accessed through manual navigation from a main URL or 

25 through a private search function (limited to searching data 
hosted by the enterprise) provided as an embedded module 
in one or more of the hosted main pages. Software instances 
163 and 165 cannot provide access to a private search 
function unless it is functionally available in either server 

30 149 or server 151. A user must invoke the private search 
function after he or she is served the hosting page in order 
to search a private data-store. Moreover, a user must often 
restructure a query for application to the new search engine 
soilware as the query rules may be different than those 

35 associated with SW 163 or SW 165. 

It can be seen, in a general sense, by one with skill in the 
art that the prior-art data search methods illustrated in this 
example are limited both by the fact that only data indexed 

^ by URL may be found, and by the fact that additional 
deeper-level data searches must be performed manually 
through user-initiated browser navigation. 

The inventor provides a unique method and apparatus that 
enables a deeper-level data search to be accomplished 

45 through an original SE application wherein no query 
re-entering by a user or additional browser navigation by a 
user is required. Such a method and apparatus is described 
in enabling detail in examples below. 

FIG. 8 is an architectural overview of a search method for 

50 data on Internet 141 according to an embodiment of the 
present invention. Much of the architecture and connection 
means illustrated in this preferred embodiment mirror those 
of the prior-art example of FIG. 7. Therefore, elements 
common to both examples retain the same element numbers 

55 and are not re-introduced. Components unique to the present 
invention whether by modification or by provision are newly 
introduced and given new element numbers. 

In this example of the present invention, the Internet 
connection means is the same as described in FIG. 8 above. 

60 A user operating PC 175 is connected to modem bank 171, 
hosted by ISP 145, by virtue of Internet access line 173. 
Connection server 169, also hosted by ISP 145, facilitates 
connection to Internet backbone 157 within Internet 141 
through Internet access line 167. However, instead of using 

65 a general search engine as was illustrated in FIG. 8, a user 
operating PC 175 is a subscriber to the personalized portal 
service described in disclosure included herein and refer- 



04/14/2003, EAST Version: 1.03.0007 



US 6,278 : 

23 

enced as Ser. No. 09/208,740 in the cross-reference section 
above. As such, a connection is opened to a portal server 179 
upon Internet log-in from user premise 147 and a portal page 
illustrated as PP is served by server 179 and appears within 
a browser application 178. 5 

Browser application 178 is enhanced for communication 
with portal server 179 by virtue of provided SW plug-ins 
(not shown), which are adapted for enabling auto-log-in to 
personal WEB pages, initiating special tasks to be performed 
by server 179, among other options which are fully 10 
described in the related documents Ser. No. 09/523,598 and 
Ser. No. 09/208,740. A user operating PC 175 while con- 
nected on-line to portal server 179 may interact with the 
provided PP in browser 178 to search for updated data from 
one or all of his or her service-registered WEB pages. In this 15 
system, portal server 179 is enhanced with a navigation 
control for browsing on behalf of a user operating PC 175. 
In general, such navigation and return of data is limited to 
sites that are known to the service and/or to the user. For 
example, navigation to sites for data acquisition on behalf of 20 
a user is accomplished with site-logic scripting, parsing and 
data-return techniques known to the inventor and described 
above. The portal service uses a system of connected nodes 
to process the many requests from users. 

A data store 185 is provided and illustrated as connected 2 s 
to portal server 179 by data link. Data store 185 is adapted 
to contain and manage data including but not limited to 
profile and subscription data about users, data about user- 
registered sites, password and user-names associated with 
those sites, and navigation scripts for accessing such sites on 30 
behalf of users. Data store 185 may be a series of separate 
data repositories all connected to server 179, or a single 
repository as represented herein, or a part of portal server 
179. Data store 185 may be of any suitable implementation 
such as an optical storage facility or the like. In this example, 35 
server 179 and connected data store 185 are held within 
Internet 141 with server 179 directly connected to backbone 
157. However, in another embodiment, server 179 and data 
store 185 may be hosted by and held within ISP 145 as 
represented in FIG. 1. 40 

Three WEB servers (WS) 181a-c are illustrated as con- 
nected to backbone 157 in Internet 141. WEB servers 
181a-c are adapted as Internet file servers as described in 
FIG. 7 (WS 159, WS 161 ). However, in this embodiment 
each WS 181a-c has at least one main HTML page hosted 45 
therein that contains a private search engine (SE) embedded 
therein as illustrated by associated flags labeled SEa, SEb, 
and SEc respectively. 

An on-line database 187 is provided and illustrated as 
connected to backbone 157 within Internet 141. Database 50 
187 represents an on-line storage facility containing addi- 
tional HTML pages hosted by WEB servers 181a-c. Data- 
base 187 may be a single data repository shared by servers 
181a-c as is represented herein or database 187 may rep- 
resent a separate database for each of WEB servers 181fl-c. 55 
Database 187 stores electronic WEB pages that may be 
accessed through a private SE hosted in any one of or all of 
servers 181a-<:. For example, WS 181a may be hosted by 
Intel™. As such, electronic pages contained in database 187 
represent deeper-level electronic pages containing informa- 60 
tion related to Intel™ and accessible through SEa hosted at 
server 181a, but not indexed by a regular SE database such 
as, perhaps, Altavista™. WS 1816 may be hosted by Gate- 
way™ and an embedded SEb, also hosted by Gateway™ 
may be used to search database 187 for URLs related to 65 
Gateway™ such as computer specifications, chip 
parameters, install instructions, and so on. 
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It is important to note here that pages having URLs 
maintained in database 187 cannot typically be accessed 
through a conventional search method because they repre- 
sent a deeper level of WEB data not indexed in either data 
store 153 or data store 155 of FIG. 7. The additional pages 
are only accessible through use of embedded SE applica- 
tions found on such as a main electronic page or pages 
hosted in servers 181o-c, or through manual navigation 
from one of the main URLs providing links to the deeper- 
level information. A private SE may be a search function 
dedicated to providing access to additional technical service- 
related URLs hosted by an enterprise. The specific SE may 
be labeled "search our technical service site", for example, 
and may be configured to search by key word or phrase. The 
search provided is, of course, limited to enterprise-hosted 
databases such as database 187. 

In a conventional sense (negotiating with the server 
hosting the SE), one would enter a key word or the like into 
the private SE as described above and would be presented 
with a list of hyper-links to the additional pages hosted by 
the enterprise which would appear in a user's browser 
application. The additional URLs may also be linked by 
icons found in various electronic pages contained in servers 
181o-c and hosted by the respective enterprises. The use of 
a private SE of the type described herein allows faster access 
to data and reduced manual navigation time for users. 

The inventor herein teaches and provides a unique appli- 
cation extension that enables a seamless bridge between a 
conventional SE and a private SE. A SW application 183, 
illustrated as executing on portal server 179, provides such 
enhanced functionality. In this example, SW 183 is a per- 
sonalized search function provided by the enterprise hosting 
server 179 and the portal service, which is available to users 
typically through subscription. SW 183 may be invoked by 
a user operating PC 175 at user premise 147 by clicking on 
an available link presented in a PP (Portal Page) within 
browser application 178. 

Once SW 183 is invoked, a user operating through 
interface 178 enters a natural language query designed to 
search for specific data. It is assumed in this example that 
specific data requested is not contained in any of the URLs 
for pages registered with the portal service. It is also 
assumed that the requested data is available in a deeper level 
of data which may be accessed through use of one or more 
private SEs hosted by one or more of the user's registered 
WEB services. 

To further illustrate, consider that WS 181a is a Hewlett 
Packard™ server registered to the portal service by a user 
operating PC 175. PC 175, in this example, may be a 
Hewlett Packard™ machine such as a Pavilion™ model 
machine. A query entered into a PP search dialog box may 
be, for example, "Bios flash upgrade information for Pavil- 
ion". SW 183 parses the entered query and processes the 
query by checking data store 185 for any related data. It is 
found that WS 181a (Hewlett Packard™) is a user-registered 
WEB site and is a likely URL for containing data related to 
the query. In one embodiment, a user may make a registered 
URL an integral part of a query command. For example, the 
query may read "search my HP WEB site for 'Bios flash 
upgrade information for Pavilion'". The double quotations 
illustrated in the command query may be used to separate the 
command portion from the query portion although this is 
merely exemplary. There are many ways to express 
command/query combinations. 

SW 183 uses a navigation sub-system (not shown), which 
is known to the inventor, to navigate to HP server 181a on 
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the user's behalf and perform auto-log-in to access a main to a proxy-navigation system. If data matching a query is not 

URL contained in server 181a to which the user subscribes. found in a connected database, then navigation may be 

The requested information is not contained in the main URL, required to obtain the requested data. Auto-log-in services 

but may be available through a private SE embedded in the may be performed during navigation to gain access to 

main URL at server 181a (illustrated by flag SEa). SW 183 5 user-registered sites. 

is, in one embodiment, adapted to recognize the code that Search application 184, as known to the inventor, is not 

identifies the embedded SE and is adapted by software the same as a tradiu'onal search engine used for generic data 

routine to locate and invoke the private SE at the main URL searches on the Internet. Application 184 is enhanced for 

in server 181a. In another embodiment, the private SE integration into the Password-all software suite described in 

parameters such as data entry rules are pre-known and are 10 Ser. No. 09/208,740 and the method for obtaining and 

accessible from data store 185. presenting WEB summaries described in Ser. No. 09/523, 

Once the private SEa is open, SW 183 transfers the 598. A basic example of using search application 184 is 

original query into the dialog box provided and executes the described in the embodiment of FIG. 5 above. In this 

search function by virtue of automated routine. If required, embodiment, Auto-log-in is performed during navigation to 

SW 183 may restructure the query to fit the rules used by the 15 g a { n access to user- registered sites, which require a user 

private SE. Data returned by the private SE is gathered by name and/or password for authentication. Data is found 

a navigation control and returned to server 179 where it may through parsing and site logic scripting. The function of 

be forwarded to the portal page (labeled PP in FIG. 8) in search application 184 assumes that there is sufficient pre- 

browser interface 178. A user may then click on any addi- known information available about the data source and data 

tional URL listed and navigate to that electronic page hosted 20 location in the source for successful navigation and parsing, 

in this case at database 187, and view the data. Application extension 186 is provided to extend the 

The search, navigation, and data-return process is trans- function of application 183 to provided a seamless interface 

parent to the requesting user as is the auto-log-in process. to a second search application which may be specific to an 

The next page the user sees is a list of related links to data enterprise hosting a WEB site comprising am plurality of 

about "Bios flash upgrade instruction". In some cases, the 25 pages having URLs. Application 186 enables SW 183, in 

additional links may appear on the same PP within browser cooperation with a proxy-navigation system, to navigate to 

178 by virtue of an automated linking process known in the and commandeer the second search engine and cause that 

art. By clicking on any one of the provided links, a user may engine to search for and return data on behalf of a user, 

navigate to the selected page and view the data contained ^ A ^ recognition module 199 is provided and adapted 

therein. to TGC ogfivLe an embedded search function held within a 

SW 183 thus provides a proxy searching function that URL opened during proxy navigation. In this way, SW 183 

may be practiced by a user from a single interface and using may find any second search function embedded in any URLs 

an original query typed into a first search dialog box. A user subject to navigation and search. In one embodiment, such 

practicing this method is not required to manually navigate ^ search functions are pre-located when a user registers a new 

until he or she is presented with a list of links related to the URL to the service such that their parameters and location 

deeper level data held in database 187 in this example. may be made part of site-logic scripting. 

It will be apparent to one with skill in the art that the An application-activation module 201 is provided within 

functionality of SW 183 is in part generic to and in accor- extension layer 186 and adapted to invoke or activate an 

dance with similar capabilities described in the related ^ embedded search function. In some cases an embedded 

documents listed under the cross-reference section. Addi- search function on will be presented in the form of an icon 

tional components added to SW 183, which provide a novel such that when invoked, a dialog box appears as a pop-up 

interface capability between SE applications are detailed widow or as a new URL. In some cases, a dialog box will 

further below. already be present and module 201 may not be required. 

FIG. 9 is a block diagram illustrating exemplary software 4S A text writer 203 is provided and adapted to rewrite an 
components of a search-function interface according to an original query into a form accepted by the search dialog 
embodiment of the present invention. SW 183 comprises a criteria associated with the second search function. If 
data-search module 184 and an application-extension layer required, writer 203 may restructure an original query to fit 
186. Search module 184 is similar in many respects to the new criteria in terms of punctuation, casing, order of 
traditional search engines except for the presence of a 50 words, association of words, and so on. In a Preferred 
browser control interface 195, and an interface to auto-log-in embodiment, such rules are pre-known and are a part of site 
function 197. logic. In an alternate embodiment, writer 203 simply pro- 
Control interface 195 is provided and adapted as an duces the original query for insertion into the dialog box 
enhancement that allows interface to a navigation system for wherein no restructuring is required, 
browsing known URLs on behalf of users. Interface 197 is 55 A data-transfer interface 205 is provided and adapted to 
provided and adapted to allow auto-log-in functions to be allow SW 183 to insert an original query into a provided 
performed on behalf of a user upon navigation to a user- dialog box by known techniques such as object linking and 
registered URL for the purpose of obtaining data requested embedding (OLE). An execution and release module 207 is 
by a user. provided and adapted to execute a second search function 

An input module 189 is provided and adapted to accept 60 after a query has been entered. At this point, the data search 

query data input into SW 183 by interfacing users. A parsing function is turned over to the new search function, which 

engine 191 is provided and adapted to read and understand returns results back to the proxy navigation control. Appli- 

data queries for purpose of further processing data requests. cation extension 186 actively runs in conjunction with the 

A database interface module 193 is provided and adapted to navigation system in integrated fashion to achieve the main 

allow interface to any connected repository to search for 65 object of the present invention, which is to enable a seamless 

data that may be compared against a query for match. interface between search applications such that a deeper 

Browser control 195, as previously described, is an interface level of data searching may be achieved. 
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Data returned by the second search function invoked by 
SW 183 is handled in the same way as described in FIG. 5 
steps 111, 113, 115, and steps 117, 119, and 121. Automatic 
linking capability allows a user receiving requested data 
links to navigate back to data contained therein. In some 
cases data located will be returned as text data with no 
linking required. 

It will be apparent to one with, skill in the art that the 
software components included in SW 183 may be provided 
to coordinate through interface with a separate proxy navi- 
gation system as known to the inventor, or may be func- 
tionally provided within the navigation software itself with- 
out departing from the spirit and scope of the present 
invention. In a preferred embodiment, the components 
described above are Java-based executables designed to 
function as a routine during Internet navigation. 

The method and apparatus of the present invention pro- 
vides a unique way for users to gain information by proxy 
from deeper levels of WEB sites without requiring exhaus- 
tive manual navigation and repeated re-entering of queries to 
new search functions. 

In one embodiment of the present invention, more than 
one secondary search function, perhaps associated with 
more than one URL may be invoked simultaneously such 
that data returned to the gathering agent is from several 
different sources or sites. 

FIG. 10 is a process flow diagram illustrating basic 
interaction steps for practicing the present invention accord- 
ing to a preferred embodiment At step 209, a user begins an 
on-line session with a portal server as exemplified in FIG. 8. 
During this process, a user-name and password pair is 
submitted to a portal server by a user for authentication 
purposes. After authentication of a user, a personal portal 
page (PP of FIG. 8) is displayed in a user's WEB browser 
at step 211. In this step, a dialog box for SW 183 will appear 
in some convenient location on the portal page. 

At step 215, a user enters a query for a data search. The 
query may be entered in a natural language as previously 
described in the example of FIG. 8. At step 213, SW 183 
processes the query for a WEB search. During this process, 
any connected databases are consulted for matching data 
before navigation is initiated. If the required data is con- 
tained in a connected database, navigation and proxy search- 
ing may not be required. For example, if a user requests data 
about "technical specifications for white diamonds*', then a 
first "look" into a database may return a user-registered site 
about diamonds and other minerals. The URL would match 
the user's query but the exact data may not be found on the 
URL page. 

Assuming that no matching data is found, navigation to 
the related URL is initiated through browser control inter- 
face at step 217. Proxy navigation to the URL or URLs that 
most closely relate to a user query is performed by a 
navigation sub-system. Auto-log-in is performed if required 
for entry into a site. 

At step 219, any private search functions associated with 
the site and available on the main URL page or pages are 
located and invoked. At step 221, original query data entered 
at step 215 is transferred to a new dialog box associated with 
a new search function. At this point, the search is handed 
over to the respective WEB site or sites. At step 223, data 
results from the secondary search, which may be in the form 
of text, additional URL links, or a combination thereof, are 
passed back to the navigation control. These results repre- 
sent data that could not have been obtained through con- 
ventional search methods because such methods are limited 
to a first WEB-site depth. 
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If a user requires immediate data return, the results are 
passed back to the user's WEB browser at step 225. If a user 
will access the results at a later date, then the results may be 
held in storage on behalf of the user at step 227. 

5 It will be apparent to one with skill in the art that the basic 
process-interaction steps represented herein may be 
expanded in description without departing from the spirit 
and scope of the present invention. For example, step 209 
may include sub-steps such as supplying password and user 

io name for authenticating. A step for invoking an original 
search application may be provided between steps 211 and 
215 if an open dialog box does not appear with the served 
portal page (PP). There are many possibilities. The inventor 
intends that the process steps represented herein are only 

15 exemplary of one suitable process among many for practic- 
ing the present invention. 

It will also be apparent to one with skill in the art that SW 
183 of FIG. 9 may be a standalone application with appro- 
priate interface capability to a navigation sub-routine with- 

20 out departing from the spirit and scope of the present 
invention. In still another embodiment, application 183 may 
be integrated with a navigation sub-routine such that navi- 
gation capability is part of the direct functioning of SW 183. 

25 The method and apparatus of the present invention may 
be practiced in a personalized sense as is described in 
previous embodiments wherein URLs are registered to users 
and auto-log-in services are performed on behalf of users 
subscribing to portal services. 

30 In another embodiment, the method of the present inven- 
tion including proxy navigation capabilities may be pro- 
vided as an extension to existing and well-known search 
engines that are provided to the public without subscription. 
Such search engines are typically used to search for more 

35 generalized data, and users do not have pre-knowledge of 
where requested data is held. A general search engine 
executing from a server may, if enhanced with the SW of the 
present invention, provide a deeper level of data searching 
than is currently offered. Such an embodiment is detailed 

An below. 

40 

FIG. 11 is a block diagram illustrating the standard 
data-search system of FIG. 7 enhanced with the method and 
apparatus of the present invention according to another 
embodiment of the present invention. In this example, user 

4 5 premise 147 is enabled by virtue of browser application 177 
to browse the Internet as described in FIG. 7. A standard 
search engine is illustrated within browser 177 and is an 
interface to search-provider (SP server) 151. User premise 
147 has connection to server 151 by virtue of an ISP- 

5 0 brokered network-connection illustrated herein by the 
double arrow labeled "Network Connection (ISP)". Such a 
connection is analogous to the compilation of lines 173 and 
167 of FIG. 7. 
Server 151 has an enhanced search engine 229 executing 

55 thereon and adapted to allow added services according to an 
embodiment of the present invention. For example, engine 
229 is enhanced with addition of a proxy browsing control 
195, which allows interface to a general version of the 
personalized navigation system described above. What is 

60 meant by "a version of* is that no site logic is employed to 
look for specific data known to exist. 

Search engine 229 is also enhanced with a "generalized 
version" of the personalized application 186 of FIG. 9. 
Meaning that there is no interface for auto-log-in. Applica- 

65 tion control 195 and extension 186 may be provided with a 
navigation sub -routine and integrated into a standard search 
engine (SE) producing the enhanced engine 229. 



04/14/2003, EAST Version: 1.03.0007 



US 6,278,993 Bl 

29 30 

Alternatively, engine 229 has a navigation control or inter- "jump-off page" or, without manually invoking a private 

face to a separate browsing sub-system, which may run on search function and entering an additional query to search 

the same server (151), or another connected server or set of for the data. 

servers. In the embodiment presented herein, it is noted that the 

Server 151 accepts a query from user 147 running appli- 5 exact parameters pertaining to rules for entering queries into 

cation 177 and using a search engine (SE) interface. The P rivat . e ^ rch .Actions is not pre-known as the system 

query may be a general request for data about a certain class ^? cr * ed m this ^ b ^ im ^ is not personalized to a user, 

of IC chips, for example. The query may contain keywords Therefore re-structuring of an original query may not be 

nr „ ^ f i , . ^ t i , . , i . possible. However, it is assumed that some standardization 

or a series of keywords describing the desired chips. ^ ^ code embed q 

Alternauvely, a phrase may be entered mstead of keywords. 10 search as ^ ^ ^ ^ ^ adminis J ed fof 

This depends on any rules that are m place and observed by dial cnt into ^ f^^s. S W 229 may be pre- 

SW 229. In normal operation SW 229 retrieves URU programmed then to understand and recognize such standard 

containing any data matching the user s query as illustrated parameters such that recognition of code and restructuring of 

by the right-angled, double arrow labeled "URLs 5 ' placed a q ue ry is still possible. In this instance, known codes and 

between server 151 and data store 155. Data store 155 « mie^ts would be pre-loaded into a database accessible to 

contains indexed URLs that may contain data that matches SW 229 such that the correct codes and rule-sets may be 

a user query. found by parsing and comparison. 

In this example, such URLs are, as would be the normal It will be apparent to one with skill in the art that SW 229 

case, returned to user premise 147 over the network con- may be adapted to work in conjunction with a navigation 

nection where they appear in a displayed search page within 20 system in a multi-tasking environment without departing 

browser window 177. A user may then select a return link to from tne spm* and scope of the present invention. For 

navigate to the electronic page indexed by the URL link. example, many user queries may be processed simulta- 

c — r, u , mT • , i * j 4. * * ss 4 * neously and the only limit to the number of URLs that may 

Some of the URI^ indexed in data store 155 may contain be Q ^ d l0 J behllf of a ^ of usefS fc ^ 

embedded search functions representing private search M prt)0e ^i D g power of me o^icated node or nodes performing 

capabilities along with data matchmg the criteria of the me nav igation and data-return functions. In another 

original query. Those URLs may be automatically assigned embodiment, SW 229 and a navigation system may be one 

for proxy browsing on behalf of the user wherein control 195 application running on one powerful server. Scalability and 

and extension 186 are employed to navigate to the pages on component distribution may be implemented according to 

behalf of a user and invoke the secondary search engines to 30 need. There are many possibilities. 

return deeper level data or URLs according to the original The method and apparatus of the present invention may 

query. In this case, the interface to auto-log-in function 197 be practiced via private individuals on the Internet, busi- 

would not be required and no site-logic scripts are used. nesses operating on a WAN connected to the Internet, 

However, all of the other described modules of FIG. 9 may businesses operating via private WAN, and so on. 

be employed. Many URLs having private search functions 3S There are many customizable situations, 

embedded therein may be found during the initial search. The present invention as taught herein and above should be 

Therefore, there may be a rule administered that limits the afforded the broadest of scope. The spirit and scope of the 

number of private search engines that may be invoked on present invention is limited only by the claims that follow, 

behalf of a user. An example of such a rule may be "navigate What is claimed is: 

to only the top ten URLs that match the query by ranking 1A method fo <" extending an on-line Internet search 

percentage and invoke deeper level searches according to beyond pre-referenced sources, comprising steps of: 

the original query**. ( a ) entering a first search criteria in a first search function; 

In another embodiment, URU found to contain private < b > ia ^^B the first search function; 

search functions are sent back to user premise 147 along (c) returnmg in the first search function a pre-referenced 

with other matching URLs or "hits" and appear in browser 45 ^ documeilt havm g data associated with the first 

177, but are listed separately. In this case, a user may select search criteria, 

a number of those URLs (containing search functions) for < d > tet ?| 1 thc t . fil f t documcnt for 311 embedded 

proxy navigation, search execution, and data return. , searc nc ion, ,,...,„ 

Returned data may, in some cases be delivered as text < e > on a ac ? 0I £ ^f 1100 m mc 

instead of additional links for manual navigation. In this 50 document^ auto maticaUy entering at least a form of the 

u . . z - >u first search criteria in the second search function: and 

case, the process would contain an extra step of a user ,~ . - , . . ' 

i I . , t . » -t tt>x . • • . ft) returning addresses in the first search function for 

selecting a number of returned URLs containing search v J A f c j *u u , . t 

a j.u i * u i * on te* documents found through the second search function. 

functions, and then submitting the selection to SPserver 151 2 ^ mcthod of ^ ± {hc ^ 

for proxy navigation data search, and data return. Selection aUows natUfal h ^ ^ and 

may, m some cases be facilitated by check boxes presented 55 comprises a par sing step for parsing criteria input for 

next to each URL. Checking a box mdicates to include this significant words and phrases for criteria matches. 

URL in proxy navigation. 3. The memodofclaim2 wherein the first search function, 

On-line database 187, as previously described in FIG. 7, in step (e), tests the second search function for criteria rules, 

represents a repository or repositories held by individual and amends the first search criteria to conform to the criteria 

.WEB sites such as sites 159 and I5l of FIG. 7. Data and 60 rules. 

URL links contained in database 187 represent deeper-level 4. The method of claim 1 wherein the first search function 

WEB site data available through a private search function or is provided by a subscription portal service, and is operated 

through manual link activation and navigation from a main by proxy by subscribers. 

URL. The method and apparatus of the present invention 5. The method of claim 4 wherein the first search function 

provides a convenient method for searching and returning 65 is limited in step (c) to returning first documents pre- 

data held on a deeper level of WEB site depth without registered to a specific subscriber invoking the first search 

requiring a user to manually navigate to the data from a function. 
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6. An Internet search application comprising: 

a first search module having a first criteria interface for 
entry of a first search criteria; 

an inspection function for identifying a second search 
module in a returned electronic document, the second 
search module having a second criteria interface; and 

an entry module for entering the first search criteria into 
the search criteria interface of the second search mod- 
ule; 

characterized in that the search application, upon entry of 
a first search criteria in the first criteria interface, 
returns at least one electronic document having a match 
to the first search criteria, inspects the document for the 
second search module, and transfers at least a form of 
the first search criteria into the second criteria interface. 

7. The Internet search application of claim 6 wherein the 
Internet search application further initiates the second search 
module after transfer of search criteria, and returns at least 
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addresses of documents found by the second search function 
in the first search function. 

8. The search function of claim 6 wherein the first search 
module allows natural language criteria entry, and parses 

5 entries for significant words and phrases for matching to 
content in electronic documents returned. 

9. The search function of claim 8 wherein the first search 
module, in step (e), tests the second search module for 
criteria rules, and amends the first search criteria to conform 

!0 to the criteria rules. 

10. The search function of claim 6 wherein the first search 
module is provided by a subscription portal service, an is 
operated by proxy by subscribers. 

11. The search function of claim 10 wherein the first 
15 search module is limited to returning first documents pre- 

registered to a specific subscriber invoking the first search 
function. 

* * * * * 
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SYSTEM FOR IMPROVING SEARCH AREA accessing a user profile and selecting a search area based on 

SELECTION the search query and the user profile. 

In other embodiments of the present invention, a method 
provides for receiving a search query, categorizing at least 

The present application is related to three applications 5 one term ^ tDe search query based on an indexed list of 

filed on the same date herewith that are respectively entitled terms derived from pages on a network, and identifying at 

and have serial numbers of SYSTEM FOR ENHANCING A least one search area based at least in part on the category of 

QUERY INTERFACE, Ser. No. 09/221,663; SYSTEM FOR a search term. In still further embodiments, the search area 

IMPROVING SEARCH TEXT, Ser. No. 09/221,659; and is selected based on a combination of the user profile and the 

COMPUTERIZED SEARCHING TOOL WITH SPELL 10 category of the search term. 

CHECKING, Ser. No. 09/221,028. Other aspects of the present invention include identifying 

the scope of a search based on a search query retrieved from 

BACKGROUND OF THE INVENTION a ^ and using lne to identify a search area. The 

The present invention relates to searching a network for relates t0 the level of det ail that the user wants the 

information. In particular, the present invention relates to returned documents to provide. The scope is identified in a 

search tools used in computer searching. number of different ways. In one embodiment, the scope is 

Computer networks connect large numbers of computers based on the number of words in the search query, 

together so they may share data and applications. Examples ! n ° * er ^bodiments, the scope is determined by compar- 

include Intranets that connect computers within a business 20 mg **™ h Wty against a user profile. In still other 

or institution and the Internet, which connects computers '^"0™**.** ™=*er of terms and the user profile are 

throughout the world. combined to identify the search scope. 

A single computer can be connected to both an Intranet ^^fT" ° f *** fT^T ^ ? T °^ S * 

nri j t t Tr.^™, t„ ™u „ « m M * method that categorizes search tools based on their ability to 

and the Internet. In such a configuration, the computer can . . . . ri.j* j 

„™ ™k^*.-™ f a c * search using search quenes of selected formats. The method 

use data and applications found on any of its own storage 25 , u n ^ , 

media such as ite hard disc drive, its optical drive, or its ta^e *** f «f * ? * r £ ^ *"* 

drive. It can also use data and actions located on qUery S fonDat the ° f the t °° l 

another computer connected to the Intranet or Internet. BRIEF DESCRIPTION OF THE DRAWINGS 

■Given the large number of locations from which a computer FIG. 1 is a plan view of a computing environment of the 

can extract data and the increasing amount of storage 30 present invention 

capacity at each of these locations, users have found it ™„ * . ' , - 

increasingly difficult to isolate the information they desire. 2 * * !> lock dia 5 am ?, f an archltechirc of ™ 

, , J . embodunent of the present invention. 

In recent years, users have begun to use search engines to , . « , ... . . . . t 

help them search the Internet Typically, search Lgines ™ \*J fl ° W 6 ^ Bm deSC f bm & the imtial P rocesses 

accept a search query from the user and then look ferine 35 of an embodiment of the present invention. 

search query's terms in an indexed list of terms. The indexed ™ G ' 4A 15 an exai *ple of an initial display produced by 

list is generated by parsing text found on individual Internet an em b°diment of the present invention. 

pages and indexing the text by the page's Uniform Resource FIG * 4B is an example of an additional display produced 

Locator (URL). by 311 embodiment of the present invention. 

Since it is impossible to index every page on the Internet, 40 FIG. 5 is an example display produced by the present 

each search engine selects a set of pages to index. Since each mention if a user wishes to go to a previous site, 

search engine is created by a different group of people, ^G. 6 * an example text display with an animated 

different search engines index different sets of pages. In fact, character in accordance with an aspect of the present inven- 

some search engines have become extremely specialized and uon shown in conjunction with an Internet browser window, 

only index pages related to a specific category of informa- 4S FIG. 7A is a n example display produced by the present 

tion such as sports or celebrities. invention when a user wants to enter a new search. 

In addition, different search engines search through their 7B is an alternative example display produced by the 

index in different ways and are optimized using different present invention when a user wants to enter a new search, 

query structures. Some search engines are optimized to 5Q FIG. 7C is a Q example display produced by the present 

accept free -text queries. Others are optimized to accept invention showing spell-checking options provided by an 

queries with logical operators such as "AND" and "OR". embodiment of the present invention. 

The differences between various search engines are FIG. 8 is a flow diagram of the central process of an 

largely unknown by average computer users. Therefore, they embodiment of the present invention, 

are not able to determine which search engine would best 55 FIG. 9 is a flow diagram showing a process for perform- 

suit their searching goals. In addition, many of the special- ing a natural language parse under an embodiment of the 

ized search engines that index specific categories of pages present invention. 

are unknown to average computer users. Therefore, users are FIG. 10 is a flow diagram for making a remote call to an 

not fully utilizing the variety of search engines available on object located on a remote server under an embodiment of 

the Internet. 60 the present invention. 

Currently, there are no tools available to help computer FIG. 11 is a layout for an NLP block produced by a NLP 

users identify which search engines they should be using to component under an embodiment of the present invention, 

optimize their search. FIG. 12 is an example of a layout for the NLP data of one 



search term in the NLP block. 
65 FIG. 13 is a flow diagram of a process for identifying 
A method of aiding a user in searching a computer possible topics under an embodiment of the present inven- 



SUMMARY OF THE INVENTION 

od of aiding a user in searching a 
environment includes retrieving a search query from a user, tion. 
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FIGS. 14A and 14B are flow diagrams of a process ronment in which the invention may be implemented, 

followed by a Topic Dictionary component under an Although not required, the invention will be described, at 

embodiment of the present invention. least in part, in the general context of computer-executable 

FIG. 14C is a block diagram of components used in instructions, such as program modules, being executed by a 

connection with the Topic Dictionary component. 5 Personal computer Generally, program modules include 

n ~. r . _ t routine programs, objects, components, data structures, etc. 

FIG. 15 is a flow diagram for constructing a Boolean that rx^rfonn particular tasks or implement particular abstract 

search query based on NLP data under an embodiment of the data typcSt Moreover, those skilled in the art will appreciate 

present invention. ^ me i nvent i on mav ^ prac ticed with other computer 

FIG. 16 is a flow diagram for submitting a search query lQ system configurations, including hand-held devices, multi- 

to a search area under an embodiment of the present inven- processor systems, microprocessor-based or programmable 

uon - consumer electronics, network PCs, minicomputers, main- 

F1G. 17 is a flow diagram for training and using the frame computers, and the like. The invention may also be 

support vector machine of FIG. 2. practiced in distributed computing environments where 

FIG. 18 is an example web companion display produced 15 tosks are Performed by remote processing devices that are 

in response to a search query directed toward a country or uaked trough a communications network. In a distributed 

continent. computing environment, program modules may be located 

rr*- m* — i u -j-i , , in both local and remote memory storage devices. 

FIG. 19 is an example web companion display produced ^ * 

in response to a search query directed toward food. Wltb rcfei ; ence to ™ G - ™ exemplary system for imple- 

™- - n . . . . j . | , , 20 me ntmg the mvention includes a general purpose computing 

FIG. 20 is an example web companion display produced . • • . u r c i i f ™ 

Z i * f Jt j i- device in the torm or a conventional personal computer 20, 

in response to a search query directed toward a non-famous • , - t /^nm ^ . 

person's name. including a processing unit (CPU) 21, a system memory 22, 

' , and a system bus 23 that couples various system components 

FIG. 21 is an example web companion display produced jading the system memory 22 to the processing unit 21. 

in response to a search query directed toward a famous ^ ^ system bus 23 may be any of several types of bus 

person s name. structures including a memory bus or memory controller, a 

FIG. 22 is an example web companion display produced peripheral bus, and a local bus using any of a variety of bus 

in response to a search query directed toward a company architectures. The system memory 22 includes read only 

name * memory (ROM) 24 and random access memory (RAM) 25. 

FIG. 23 is an example web companion display produced 30 A basic input/output (BIOS) 26, containing the basic routine 

in response to a search query directed toward an URL, that helps to transfer information between elements within 

FIG. 24 is an example web companion display produced the personal computer 20, such as during start-up, is stored 

in response to a search query directed toward a city. in ROM 24. The personal computer 20 further includes a 

FIG. 25 is an example web companion display produced hard disk 27 for reading from and writing to a hard disk 

in response to a search query directed toward a restaurant. 35 (not shown), a magnetic disk drive 28 for reading from or 

FIG. 26 is an example web companion display produced ft t0 P ^movable magnetic disk 29, and an optical disk 

in response to a search query directed toward sound. *° ^r reading from or writing to a removable optical 

™« „ . : „ , , disk 31 such as a CD ROM or other optical media. The hard 

FIG. 27 is an example web companion display produced disk ^ 27 m tic disk drivc M md optic al disk drive 

in response to a search query directed toward pictures. ^ 30 m to ^ system bus 23 by / hard ^ Myc 

FIG. 28 is an example web companion display produced interface 32, magnetic disk drive interface 33, and an optical 

in response to a search query having a narrow scope. drive interface 34, respectively. The drives and the associ- 

F1G. 29 is an example web companion display produced a ted computer-readable media provide nonvolatile storage 

in response to a search query having a broad scope. 0 f computer readable instructions, data structures, program 

FIG. 30 is an example web companion display produced 45 modules and other data for the personal computer 20. 

to provide alternative search suggestions. Although the exemplary environment described herein 

FIG. 31 is an example of a search query with an ambiguity employs the hard disk, the removable magnetic disk 29 and 

as to time. the removable optical disk 31, it should be appreciated by 

FIG. 32 is an example of a web companion display mose skmed in the art that other types of computer readable 

produced to remove an ambiguity related to time. 50 media which can store data that is accessible by a computer, 

FIG. 33 is an example of a search query with an exclusion such 35 ™ff*Hc cassettes, flash memory cards, digital 

ambiguity video disks, Bernoulli cartridges, random access memories 

^ a . 1 r 1 j . 1 (RAMS), read only memory (ROM), and the like, may also 

FIG. 34 is an example of a web companion display > , A . J , • / 

, , . . be used m the exemplary operating environment, 

produced to remove an exclusion ambiguity. t . „ , , . , < « 
™„ . , - . .. ,. 5 5 A number of program modules may be stored on the hard 

FIG. 35 is an example of a search query with a coorch- disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, 

nating structure ambiguity. - , *• - « r 

& to J including an operating system 35, one or more application 

FIG. 36 is an example of a web companion display programs 36, other program modules 37, and program data 

produced to remove a coordination structure ambiguity. 38 A ^ may enter comma nds and information into the 

FIG. 37 is an example of a web companion display eo personal computer 20 through local input devices such as a 

produced to fine tune the search query if it does not contain keyboard 40, pointing device 42 and a microphone 43. Other 

ambiguities. input devices (not shown) may include a joystick, game pad, 

DETAILED DESCRIPTION OF ILLUSTRATIVE f elli,e ^ f^"' " ^ f 6 ;™ 656 and ° ,her ^ 

EMBODIMENTS devices are often connected to the processing unit 21 

65 through a serial port interface 46 that is coupled to the 

FIG. 1 and the related discussion are intended to provide system bus 23, but may be connected by other interfaces, 

a brief, general description of a suitable computing envi- such as a sound card, a parallel port, a game port or a 
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universal serial bus (USB). A monitor 47 or other type of 
display device is also connected to the system bus 23 via an 
interface, such as a video adapter 48, In addition to the 
monitor 47, personal computers may typically include other 
peripheral output devices, such as a speaker 45 and printers 5 
(not shown). 

The personal computer 20 may operate in a networked 
environment using logic connections to one or more remote 
computers, such as a remote computer 49. The remote 
computer 49 may be another personal computer, a hand-held 10 
device, a server, a router, a network PC, a peer device or 
other network node, and typically includes many or all of the 
elements described above relative to the personal computer 
20, although only a memory storage device 50 has been 
illustrated in FIG. 1. The logic connections depicted in FIG. 1S 
1 include a local area network (LAN) 51 and a wide area 
network (WAN) 52. Such networking environments are 
commonplace in offices, enterprise-wide computer network 
Intranets, and the Internet. 

When used in a LAN networking environment, the per- 2 o 
sonal computer 20 is connected to the local area network 51 
through a network interface or adapter 53. When used in a 
WAN networking environment, the personal computer 20 
typically includes a modem 54 or other means for establish- 
ing communications over the wide area network 52, such as 2 s 
the Internet The modem 54, which may be internal or 
external, is connected to the system bus 23 via the serial port 
interface 46. In a network environment, program modules 
depicted relative to the personal computer 20, or portions 
thereof, may be stored in the remote memory storage 30 
devices. It will be appreciated that the network connections 
shown are exemplary and other means of establishing a 
communications link between the computers may be used. 
For example, a wireless communication link may be estab- 
lished between one or more portions of the network. 35 

The present invention provides a web companion that acts 
as an interactive searching aid for searching a computer 
environment, especially an environment that includes an 
Intranet or the Internet. The web companion is interactive in 
the sense that it provides the user with searching options 40 
based on the search query provided by the user and previous 
searching options the user has selected. Some of the options 
provided by the web companion are possible search goals 
that the user may have, such as a person's e-mail address, or 
photographs of a celebrity. If the user selects one of the 45 
goals, the web companion can automatically select an appro- 
priate search area and/or adjust the user's search query to 
improve the likelihood that the user will find what they are 
looking for. 

The web companion may be invoked in a number of 50 
different ways. In a Windows 95®, Windows 98® or Win- 
dows NT® based operating system provided by Microsoft 
Corporation, the web companion can be invoked by 
"double-clicking** on an icon appearing in the environment. 
In addition, the web companion can be invoked from within 55 
a browser such as Internet Explorer 4 (IE4) from Microsoft 
Corporation. In particular, the web companion can be reg- 
istered with 1E4 so that IE4 opens the web companion in the 
background when IE4 is opened. In such a configuration, the 
web companion does not display an interface while it is 60 
operating in the background. When the user enters a search 
in IE4, either through a search engine on the Internet or 
through the browser's search screen, the search is provided 
to the web companion. The web companion then processes 
the search through steps described below and determines 65 
possible suggestions that would aid the user. In some 
embodiments, the web companion then generates an inter- 
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face to display the suggestions to the user as described 
below. In other embodiments, the web companion only 
displays an interface if the suggestions have a high prob- 
ability of being helpful to the user. When the web companion 
is invoked through IE4 in this manner, the web companion 
display disappears if the user does not adopt a suggestion 
made by the web companion. The web companion may also 
be stored on a remote server and invoked through a network 
connection to the remote server. 

FIG. 2 shows a component architecture for the present 
invention. The web companion is initiated by calling an 
executable application identified as WEB COMPANION 
200 in FIG. 2. WEB COMPANION 200 invokes an instance 
of IE4 control 202, which is an extendable hypertext mark- 
up language (html) interpreter produced by Microsoft Cor- 
poration. WEB COMPANION 200 also passes a .htm page 
denoted as DEFAULT.HTM 204 to IE4 control 202, thereby 
causing IE4 control 202 to execute the instructions in 
DEFAULT.HTM 204. 

The instructions in DEFAULT.HTM 204 include requests 
for instances of three ACTIVE-X controls: SEARCH- 
AGENT 206, QUERYENG 208, and TRUEVOICE 210. 
Each ACTIVE-X control includes methods that can be 
invoked by DEFAULT.HTM 204 and each ACTIVE-X 
control is able to fire events that are trapped by 
DEFAULT.HTM 204. 

QUERYENG 208 cooperates with DEFAULT.HTM 204 
and WEB COMPANION 200 to perform most of the func- 
tions of the present invention. SEARCH-AGENT 206 
generates, positions and animates a graphical character, 
shown as character 262 in FIG. 4B, based on method calls 
from DEFAULT.HTM 204. SEARCH-AGENT 206 also 
allows the user to move the animated character using an 
input device. When the animated character is moved by the 
user, SEARCH-AGENT 206 fires an event indicating the 
new position of the character, which is trapped by 
DEFAULT.HTM 204. 

TRUEVOICE 210 produces sounds based on method 
calls made by DEFAULT.HTM 204. Typically, these sounds 
are timed to coincide with the animation of the character 
produced by SEARCH- AGENT 206. 

WEB COMPANION 200 generates a balloon, such as 
balloon 260 of FIG. 4B. The balloon is positioned on the 
screen based on the location of the animated character, 
which is provided to WEB COMPANION 200 by QUERY- 
ENG 208. Based on instructions in DEFAULT.HTM 204 or 
alternatively, instructions in Active Server Pages (.ASP) 
called by DEFAULT.HTM 204, IE4 control 202 displays 
text and control buttons in the balloon. An example of text 
displayed by IE4 control 202 is shown in FIG. 4B as text 261 
along with an example of a control button 263. Control 
button 263 may be activated by the user by positioning the 
cursor over the button and pressing an input device button. 

The Active Server Pages called by DEFAULT.HTM 
include HTML instructions. Although only three ASP files 
212, 214 and 216 are shown in FIG. 2, those skilled in the 
art will recognize that any number of .ASP files may be used 
in conjunction with DEFAULT.HTM 204. 

FIG. 3 is a flow diagram of the steps followed by the 
computer-executable instructions found in WEB COMPAN- 
ION 200, IE4 control 202, DEFAULT.HTM 204, SEARCH- 
AGENT 206, and QUERYENG 208. In an initial step 229, 
DEFAULT.HTM determines if this is the first time WEB 
COMPANION 200 has been invoked by this user. If it is the 
first invocation by this user, an introductory interface is 
provided at step 231 as shown in FIG. 4A. In FIG. 4A, IE4 
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control 202 displays introductory text 265, produced by PLORE 218. DEFAULT.HTM 204 passes the URL of the 

DEFAULT.HTM 204, in a balloon 267 produced by WEB selected site through QUERYENG 208 and WEB COM- 

COMAPANION 200. At the same time, SEARCH-AGENT PANION 200 to IEXPLORE 218 at step 242. 

206 displays an animated character 269 next to the intra- IEXPLORE 218 uses the site's URL to locate the site's 

ductory balloon. 5 server over a network connection, such as the Internet, and 

If this is not the first invocation of WEB COMPANION to make a re 9 uest from ^ s*™* for the site ' s The 

200, or after the display of the initial screen, the process locate< ! show ° *L!^ ^19 in FIG. 2 returns the 

continues at step 228 where a first selection display is requested content to IEXPLORE 218. As those skilled in the 

produced by WEB COMPANION 200, DEFAULT.HTM £™ U ' he ; stuped content can take many fo^ 

204 AND SEARCH-AGENT 206. An example of this 10 ™^^ m 5 h H' ^ °L ?T 

j ■ * . , . c „ , 0 < . * . , receives from server 219 and displays the content m a 

t^ruTrp^ <>n< V a ^ 2 uV r0dU ^n browser EXPLORE 218 remains open until the 

by SEAROT-AGE^TT '206 s^wn next to a balloon 260 uscr closes thc browscr ^ aUow £ ^ ^ ^ 

produced by WEB COMPANION 200 that contains text 261 perform farther Internet searching and viewing operations 

and control buttons 263 produced by DEFAULT.HTM 204 through the browser. Such operations are separate and 

and IE4 control 202. In the selection display of FIG. 4B, the 1S independent of the operation of the web companion. 

user may either choose to perform a new search or go to a FIG. 6 presents a screen display where a web companion 

previously visited site. Thus, depending on what the user balloon 300 and a character 304 appear on the same screen 

selects, the process either continues at step 230 OR 246. as an internet browser window 306 created through the steps 

If the user chooses to go to a previous site, the computer- described above. Browser window 306 is independent of 

executable instructions follow step 230 to step 232, where 20 balloon 300 and character 304 and may be moved, 

they locate recently visited sites stored for this user. In one expanded, closed, and have its dimensions changed inde- 

embodiment, the recently visited sites are stored in Registry pendently of balloon 300 and character 304. 

222 of FIG. 2, which is a memory location maintained by If at steps 228 or 237 of FIG. 3, the user selects to perform 

many of the operating systems produced by Microsoft a new search, me computer-executable instructions continue 

Corporation. However, the recently visited sites may be 25 at step 246. Step 246 leads to step 320 of an additional flow 

stored in any suitable memory location on the local machine diagram shown in FIG. 8. 

or a server. After locating the names of recently visited sites, At step 320 of FIG. 8, DEFAULT.HTM 204 causes IE4 

the computernexecutable instructions proceed to step 234, con trol 202 to display a search interface. An example of such 

where the instructions locate the names of sites that the user a search interface is shown in FIG. 7A, where the interface 

frequently visits. In one embodiment, these sites are also appears within a balloon 308 produced by WEB COMPAN- 

stored in Registry 222. t 0 N 200 that appears adjacent animated character 310 

At step 236, DEFAULT.HTM 204 causes IE4 control 202 produced by SEARCH- AGENT 206. 

to display a selectable list of recently visited sites and in addition to defining the search interface shown in FIG. 

frequently visited sites. An example of such a selectable list 35 7A, DEFAULT.HTM 204 establishes an instance of a spell 

is shown in FIG. 5 in balloon 264. The selectable list is checking object identified as SPELLCHECK 221 in FIG. 2. 

accompanied by animated character 266, which is produced DEFAULT.HTM 204 assigns a text box 312 in balloon 308 

by SEARCH-AGENT 206. to SPELLCHECK 221 so that text entries and cursor move- 

The selectable list of balloon 264 includes selectable ments within text box 312 are passed directly to 

entries for five recently visited sites 268, 270, 272, 274, and ^ SPELLCHECK 221. This allows SPELLCHECK 221 to 

276, and selectable entries for five frequently visited sites verify the spelling of words as they are entered by the user 

278, 280, 282, 284, and 286. The selectable list also includes and to suggest alternative spellings when the user places the 

an option to search the Internet. In many embodiments, the cursor over a word and activates a button on their mouse or 

names of the sites that appear in balloon 264 are the common track-ball. 

names for the sites. In other words, the Uniform Resource 45 The search interface found in balloon 308 of FIG. 7A 

Locators (URLs) for the sites normally do not appear in includes a solicitation to the user to type in their search 

balloon 264, since most users find it difficult to associate a request in a natural language or free text format. In these 

site's URL with its contents. However, to accommodate formats, the user simply enters normal statements or ques- 

users that want to see a site's URL, the present invention tions and does not need to include logical operators to 

provides a pop-up window that appears if the user pauses the 50 indicate the relationship between the terms of the search 

display caret over a site's name. An example of this is shown query. Text box 312 displays the user's search query as the 

in FIG. 5, where URL window 280 has opened for entry 270. user types and allows the user to modify their query. This 

In FIG. 5, the caret is not shown so that entry 270 is not search solicitation process is represented by step 320 of FIG. 

obscured. 8. 

While the selectable list of balloon 264 is displayed, 55 FIG. 7B provides an alternative search solicitation display 

DEFAULT.HTM 204 waits for the user to select one of the to that shown in FIG. 7A. In FIG. 7B, a pull-down text box 

listed sites in a step 237. If the user selects a site, the 250 is provided to accept and display the user's search text, 

computer-executable instructions follow step 238 to step Pull-down text box 250, includes a pull-down activation 

240. arrow 251 that causes a pull-down window 252 to be 

In step 240, DEFAULT.HTM 204 calls a method in 60 displayed when activated. Pull-down window 252 displays 

QUERYENG 208 to pass a message to WEB COMPANION a selectable list of past search queries entered by the user and 

200, asking WEB COMPANION 200 to locate or instantiate allows the user to select a past search query by highlighting 

an Internet browser such as IEXPLORE from Microsoft it. Typically, past search queries are stored in Registry 222 

Corporation. If one or more Internet browsers are open, of FIG. 2. However, they may be stored in any suitable 

WEB COMPANION 200 selects the top browser. If there are 65 memory location. 

no open browsers, WEB COMPANION 200 opens a By recording the user's past searches and by allowing the 

browser. In FIG.2, the opened browser is shown as IEX- user to review their past searches, the present invention 
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improves searching efficiency by reducing the likelihood 
that the user will unknowingly reuse unsuccessful searches 
or waste time trying to remember past successful searches. 

While the user is entering their search query, the query is 
spell checked by SPELLCHECK 221 at a step 322 of FIG. 5 
8. If the search query includes a misspelled word, 
SPELLCHECK 221 provides a visual cue to the user that a 
word is misspelled. In many embodiments, this visual cue is 
a red line underneath the misspelled word, FIG. 7A shows an 
example of a visual cue 309 beneath the misspelled word 10 
"amercan". In further embodiments, SPELLCHECK 221 
displays a list of properly spelled words when the user 
activates a button on their input device. An example of such 
a display is shown in FIG. 7C where a selectable list 311 is 
displayed by SPELLCHECK 221 in response to a button 15 
being activated on an input device while the cursor is 
positioned over the word "amercan". If the user selects one 
of the properly spelled words, SPELLCHECK 221 auto- 
matically replaces the misspelled word with the selected 
word. 20 

Once the user has finished entering and modifying their 
query, they activate NEXT button 313 of FIG. 7A or NEXT 
button 253 of FIG. 7B, which causes the instructions of 
DEFAULT.HTM 204 to request the query text from 
SPELLCHECK 221 and to initiate processing of the query 25 
text. Such processing begins at step 324 of FIG. 8, where the 
web companion performs a natural language parse (NLP) of 
the query text. The steps taken to perform the natural 
language parse are shown in detail in the flow diagram of 
FIG. 9. 30 

The NLP process of FIG. 9 begins at step 450, where 
QUERYENG 208 of FIG. 2 replaces the spaces between 
words found in quotes in the user's query with underscores. 
At step 454, the search query is stored in a shared buffer 223 
of FIG. 2. QUERYENG 208 then makes a call to invoke the 
NLP component at a step 456. 

The steps required to make the call to invoke the NLP 
component are shown in the flow diagram of FIG. 10. The 
steps of FIG. 10 begin at step 480 where, as shown in FIG. ^ 
2, WEB COMPANION 200 starts an instance of IEX- 
PLORE 224. WEB COMPANION 200 also passes a control 
file 225 to IEXPLORE 224. In step 482, control file 225 
causes IEXPLORE 224 to start a second instance of QUE- 
RYENG denoted as QUERYENG 226 in FIG. 2. QUERY- 45 
ENG 226 retrieves the search query stored in shared buffer 
223 and packages the query to send it to the NLP compo- 
nent. 

In step 486 of FIG. 10, IEXPLORE 224 routes the 
package created by QUERYENG 226 to the NLP compo- 50 
nent. If the NLP component is on client 199, the package is 
routed directly to the component. If the NLP component is 
located on a remote server, the package is routed to an 
Internet Server Application Programming Interface 
(ISAPI.DLL). The ISAPI.DLL then routes the package to 55 
the NLP component. In the embodiment of FIG. 2, NLP 
component 227 is located on a remote server 233, so the 
package is routed to an ISAPI.DLL 235, which routes it to 
NLP component 227. For clarity in the discussion below, 
NLP component 227 is used to describe the functions of the 60 
NLP component. However, it should be recognized that 
these functions are not dependent on the location of the NLP 
component and an NLP component with the same capabili- 
ties may alternatively be located on the client under the 
present invention. 6S 

In step 488, the NLP component 227 performs natural 
language parsing functions on the search query. NLP com- 



ponent 227 uses known logical and syntactic rules to iden- 
tify respective parts of speech for each term in the search 
query. NLP component 227 also identifies words that modify 
other terms in the search query and how words modify each 
other. In addition, NLP component 227 reduces each term in 
the search query to its most basic form and creates inflected 
and plural forms from the most basic form. NLP component 
227 is also able to identify the semantics of certain words 
and categorize them. For instance, NLP component 227 is 
capable of recognizing that the term "recent" is related to 
time. Other categories include city, state, country, continent, 
and proper name, etc 

NLP component 227 can also group together multiple 
words that represent a single conceptual item. For instance, 
NLP is able to identify the constituent parts of a date as 
belonging to a single date construct. To identify these 
"multi-word entries", NLP component 227 utilizes "fac- 
toids" and "captoids". Factoids are rules that identify multi- 
word entries on the basis of known facts. For example, NLP 
component 227 identifies "New Jersey" as a single multi- 
word entry because of the fact that New Jersey is a state. 
Captoids are rules that identify multi-word entries on the 
basis of the capitalization of terms in the query. For instance, 
if "Jack's Seafood Restaurant" is found in a search query, 
NLP component 227 will identify it as a multi-word entry on 
the basis of its capitalization. 

NLP component 227 returns a block of NLP data embed- 
ded in an HTML page that is routed back to IEXPLORE 
224. This is shown in FIG. 10 as step 488. At step 490, 
IEXPLORE 224 replaces control file 225 with the HTML 
page returned by NLP component 227. This causes QUE- 
RYENG 226 to close. At step 492, the returned HTML page 
causes another instance of QUERYENG (QE3) to start, 
which at step 494 places the returned NLP block in shared 
buffer 223. IEXPLORE 224 and QE3 then close at step 496. 
The final step in making the call to NLP component 227 is 
step 498 where original QUERYENG 208 retrieves the 
returned NLP information from shared buffer 223. 

After the call to the NLP component the process of FIG. 
9 continues at step 460, where the NLP block returned by the 
NLP component is parsed into its constituent parts. One 
embodiment of the NLP block structure is shown in FIG. 11, 
where NLP block 508 includes a data set for each NLP term. 
For example, NLP data for a first term is found in data set 
510, which is followed by a new-line marker (/N) 512. The 
NLP data for the terms are together positioned between 
matching markers 514 and 516 that include lines of dashes 
("-" ) that are terminated with new-line markers. 

The NLP data for each term is normally of the form shown 
in FIG. 12 for data set 510. Data set 510 includes nine fields: 
WORD POSITION 518, WORD 520, PART-OF-SPEECH 
522, WHAT-IT-MODIFDES 524, HOW-IT-MODIFIES 526, 
'AND' or 'OR' SET 528, PULRAL 530, INFLECTED 
FORMS 532, and NLP BITS 534. WORD POSITION 518 
contains the word's numerical location in the query and is in 
the form of an integer. WORD 520 and PART-OF-SPEECH 
522 provide the word itself and its part-of-speech in the 
query, respectively. WHAT-IT-MODIFIES 524 indicates the 
number of any word that the current word modifies in the 
query and HOW-IT-MODIFIES 526 indicates the manner in 
which it modifies these other words. Examples of entries in 
HOW-IT-MODIFIES 526 include noun-adjective (NADJ) 
relationships where an adjective modifies a noun. It can also 
include generic modifying relationships such as the case 
where a noun modifies another noun, rather than an adjective 
modifying a noun. An example of this would be "Whitewa- 
ter scandal" or "plant species". 'AND'-or-'OR' SET 528 
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indicates whether the term is part of a coordinating set based Topics Dictionary is located on client 199 or on servers other 

on 'AND* or 'OR*. If the term is not part of such a set, the than server 233. Regardless of its location, Topics Dictio- 

value in this field will be -1 . If the term is part of an 'AND' nary 239 receives a package of keywords from IEXPLORE 

set, the field will have a value between 0 and 99. If the term 224 and as shown in step 488 of FIG. 10, performs functions 

is part of an 'OR* set, this field will have a value greater than 5 on the terms in the package. 

* , T „ „ A . , , _ „ , , „ The operation of Topics Dictionary component 239 is 

PLURAL 530 ^vid« a plural form of the term if shoWQ throll ^ flow dj s in mQS UA Md 14B and a 

approbate and INFLECTED FORMS 532 provide, any Wock jj ^ mQ ^ ^ ^ ^ f nQ uc 

inflected forms or the term, separated from each other by . «u jj-*- i . .t j l t • n- 

commas. NLP BITS 534 provides semantic markers that 10 showsthe adch f° a i ^P 0 ™* uu^ed by Topics Dictio- 

indicate semantic information about the term. Examples of 10 nary 239 * ldent £ P*«*fc t0 P lcs based on the 

such markers include: "+tme" for terms related to time, keywords of the search text The flow diagrams describe the 

"+city" for terms identifying a city, "+nme" for a person's P rooess ^ bv To P lcs Dict i°™ry component 239 to iden- 

name, "+neg" for a term providing a negative meaning, t*fy the topics. 

"+vulgar" for vulgar terms, and "+food" for terms related to Io an initial step 600 of FIG. 14A, an executable denoted 

food. The list above is only provided as an example and 15 as WEB-PARSE 967 in FIG. 14B, is initiated, which passes 

those skilled in the art will recognize that other markers are a URL list 960 to Topics Dictionary component 239. URL 

possible. list 960 includes a set of Uniform Resource Locators for 

Returning to the flow diagram of FIG. 9, the parsing pages located on the Internet and/or Intranet. In the list, each 

function of step 460 parses the fields of each term into URL is associated with one or more topics and with a 

program variables used by QUERYENG 208 and 20 scripting function discussed further below. In step 601, a 

DEFAULTHTM 204. When the parse is complete, any database server 972, which forms part of Topics Dictionary 

parsed words in the WORD field of the NLP block that are 239 > uses URL list 960 to generate a source database 961 

"stop words" are deleted to form a set of keywords. "Stop that represents the associations found in URL list 960. 

words" include words that occur so frequently in a language At step 602, WEB-PARSE 962 uses database server 972 

that they have no significance in a search query. Examples 25 to sequentially access the URL records stored in source 

include articles such as "the" and "a", many prepositions, database 961. For each URL, WEB-PARSE 962 invokes a 

and common verbs such as "have" and "be". The removal of browser 964, such as Internet Explorer 4 from Microsoft 

stop words is shown as step 462 in FIG. 9. In one Corporation. Browser 964 uses the URL to retrieve the 

embodiment, stop words found in quoted phrases in the URL's page from a remote server 966 and to store the page 

user's query and stop words that appear in a multi-word locally. 

entry identified by NLP component 227 are not removed. Once the page has been retrieved, WEB-PARSE 962 calls 

At step 464 of FIG. 9, NLP data for each of the terms is scripting functions 963 that are associated with the URL in 

checked to see if an inflected form returned by NLP matches source database 961. These scripting functions isolate 

the term itself or its plural form. If there is a match, the 35 desired information in the URL's page using dynamic object 

inflected form is deleted to remove redundancy in the NLP models of the HTML tags on the page. These object models 

data. At step 466, underscores are removed from between are created by HTML object modeler 965 in response to 

multi-word entries that appear as a single term in the method calls made by scripting functions 963. 

returned NLP block. After step 466 of FIG. 9, step 324 of The isolation functions performed by scripting functions 

FIG. 8 is complete and the conversion of the natural lan- ^ 963 strip irrelevant information from a URL's page. Thus, if 

guage parse data into keywords has been accomplished. a pa g e contains a header, some opening text, and a list of 

At step 326 of FIG. 8, the keywords formed in step 324 celebrity addresses, and the topic associated with the page is 

are, if desired, used to modify the behaviors or animations celebrity addresses, the scripting functions can isolate the 

of the character. For instance, in one embodiment of the celebrity addresses from the header and the opening text, 

invention, if one of the keywords is associated with the NLP 45 This is accomplished using the HTML object models, which 

vulgar bit, the character is changed so that it blushes. allow the scripting functions to manipulate the URL's page 

In step 328 of FIG. 8, the original search query, the based on HTML tags in the page. For example, the scripting 

keywords found in step 324 and their associated NLP bits functions can retrieve all of the text lines associated with 

are used to identify possible search topics. These search <anchors> HTML tags by calling an HTML object method 

topics represent broad categories of information that the 50 that performs that function. 

search query appears to be directed toward. The process of In most embodiments, the scripting functions treat the text 

identifying these categories is shown in greater detail in the string associated with an individual HTML tag as an indi- 

flow diagram of FIG. 13. vidual entry. Thus, if a multi-word text string is associated 

In first step 549 of FIG. 13, the keywords obtained in step with an HTML tag, the entire text string is considered one 

324 of FIG. 8 are stored in a shared buffer such as shared 55 phrase. By storing text strings found on network pages as 

buffer 223 of FIG. 2. In one embodiment, phrases that are in single units, the present invention improves the precision of 

quotes in the user's query appear unchanged and remain in the topics it returns. Thus, if "John Glen's Address" appears 

quotes in shared buffer 223. A Topics Dictionary component on a celebrity page and "John's Apple Store" appears on a 

is then called in step 550 using the technique described shopping page, a search query for "John Glen's Address" 

above for invoking the NLP component. To invoke the 60 will only return a hit for the celebrity page and not for the 

Topics Dictionary component using the technique described shopping page, even though both pages contain the word 

above, the control file 225 passed to IEXPLORE 224 is "John". If the terms on the pages were stored individually, 

modified so that it causes the keywords to be passed to a both pages would produce a hit resulting in an erroneous 

Topics Dictionary component instead of the NLP compo- identification of the topic of the search, 

nent. 65 For each entry isolated by scripting functions 963, WEB- 

In the embodiment of FIG. 2, a Topics Dictionary 239 is PARSE 962 places the entry and its associated topics in 

shown on server 233. In other alternative embodiments, the index database 967 and/or HTML files 975. The entries that 
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are placed in index database 967 are placed there by making these smaller units is found Index database 967 and/or Index 

database calls to database server 972 and can be accessed server files 974 at step 610, the associated topics 971 are 

later by making additional calls to database server 972. The returned at step 612 and the terms corresponding to the 

entries that are placed in HTML files 975 are placed there by matching phrases are removed from the sub-query. If both of 

Topics Dictionary 239 so that they may be indexed by an 5 these smaller units are found in Index database 967 and/or 

Index server 973 to form Index server files 974. Whether an Index server files 974 at step 610, the associated topics 971 

entry and its associated topics are added to the index for both units are returned at step 612 and all of the terms of 

database or the Index server files is controlled by a property the sub-query are removed. 

in URL list 960. The topics for an entry may be any one of if neither of these smaller units is found in Index database 

or all of the topics listed for this URL page in source 10 967 at step 610, the length variable N is again reduced by 

database 961. one at step 611. If N is still greater than one at step 607, 

At step 603, WEB-PARSE 962 passes individual terms Topics Dictionary 609 searches for all phrases of length N 

found in a term list 968 to Topics Dictionary component 239. found in the search query. Using the example above, this 

Term list 968 includes a list of terms and phrases organized produces three units (A B) (B C) and (C D), which are each 

by topics and allows terms to be associated with certain 15 searched for in Index database 967. 

topics in Index database 967 or Index server files 974 even Steps 607, 609, 610 and 611 are repeated until a query unit 

though the terms do not appear in a URL's page. For each is found in Index database 967 or Index server files 974, or 

term or phrase in term list 968, database server 972 creates until N equals one. 

an additional entry in Index database 967 and/or Topics When N cquals one al stcp 607 or after topics havc been 

Dictionary 239 creates an additional entry in HTML text file 20 retumcd at step 612, the process continues atstep 608 where 

975 to be indexed by Index server 973. N ^ reset to equal the number 0 f terms remaining in the 

In one embodiment of the invention, all forms of the sub-query. This number will be less than the number of 

individual terms (e.g. plural, singular, past tense etc.) either terms originally in the sub-query if terms were removed in 

from a URL page or the term list are derived before being step 612 after a match was found. At step 608, only those 

stored. This derivation creates all of the linguistic forms of 25 terms that did not have a match are left in the sub-query, 

each individual term. For example, if the term "televisions" At step 613, N is compared to one and if it is greater than 

is found in the URL page, it is stemmed to provide both one> Topics Dictionary 239 places a Boolean "AND" 

"television" and "televisions". between each term of the sub-query at step 614. Continuing 

Once the isolated entries for each of the URL pages listed 3Q the example above, this produces a logical search string (A 

in source database 961 have been entered in Index database AND B AND C AND D). Topics Dictionary 239 then 

967 and/or Index server files 974, the process of FIG. 14A searches for strings that have each of the terms of the logical 

pauses at step 604 to wait for a search query. When a search search string. Any string that has all of the search string's 

query 970 is received, the process continues at step 605 of terms, regardless of the order of the terms in the string, will 

FIG. 14B where Topics Dictionary 239 divides the query 3S be considered a match at step 615. If there are no strings that 

into sub-queries. Specifically, each quoted phrase in the match the logical search string at step 615, N is reduced by 

user's query and each multi-word entry designated by NLP one at step 616 before being compared to "1" at step 613. 

component 227 are set as separate sub-queries. Once the If N is greater than one at step 613, step 614 is repeated 

phrase and multi-word entries have been grouped into using only N terms in each combination. Using the example 

sub-queries, the remaining terms in the user's query are ^ above with N reduced to "3", Topic Dictionary 239 searches 

grouped as a single sub^query. based on four logical search strings (A AND B AND Q, (A 

Each sub-query found above is sequentially processed AND B AND D), (A AND C AND D), and (B AND C AND 

through the remaining steps of FIG. 14B. In step 606, one of D). If none of these logical search strings result in a match 

the sub-queries is selected and the number of terms in the at step 615, then steps 616, 613, 614, and 615 are repeated 

sub-query is used to initialize a search length variable M N". 45 until there is a match or until N equals one. If there is a 

At step 607, N is compared to "1" to determine if there is match at step 615, Topics Dictionary 239 returns the 

more than one term in the sub-query. If there is more than matches to IEXPLORE 224 in a topic list embedded in an 

one term, Topics Dictionary 239 uses database server 972 HTMLpage at step 617. Topics Dictionary 239 also removes 

and/or Index server 973 to search through Index Database the terms of the matching logical search string from the 

967 and/or Index server files 974 for the entire sub-query at 50 sub-query. 

step 609. If one or more matches are found for the entire If N is equal to one at step 613, or after topics are returned 

sub-query at step 610, the associated topics 971 are returned at step 617, the process continues at step 618, where Topics 

to IEXPLORE 224 at step 612. In one embodiment, the Dictionary 239 searches for each remaining term of the 

topics are returned in an HTML page. However, those sub-query on an individual basis. If there is at least one 

skilled in the art will recognize that the topic may be 55 match at step 619, Topics Dictionary 239 determines if there 

returned in any suitable format. If a match cannot be found are fewer matches than a maximum number at step 620. In 

for the entire sub-query at step 610, the length variable N is the embodiment of FIG. 14B, the maximum number is 

reduced by one at step 611 and control returns to step 607. twenty but other numbers are possible. If there are fewer 

If N is again greater than one at step 607 the length variable than the maximum number of matches, Topics Dictionary 

N is reduced by one at step 611 and control returns to step 60 239 returns the corresponding topics at step 621. If more 

607. than one term of the sub-query matches, the corresponding 

If N is again greater than one at step 607, Topic Dictionary topics are returned for each term. 

239 searches for all phrases of length N found in the la one embodiment, the topic lists returned at steps 612, 

sub-query. Thus, if the initial query was (ABC D), where 617 and 621 include lists of the matching topics, the number 

A, B, C, and D are each terms, the first reduction in size 65 of matches for each topic, and weighting bits that indicate if 

produces two search units (ABC) and (B C D) that are the keywords match an entire string stored in Index database 

searched for individually in Index database 967. If either of 967 or Index Server Files 974. Thus, if the user's query is 
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"Tom Hanks in Saving Private Ryan", and "Saving Private 
Ryan'* is stored under the MOVIE topic, a weighting bit 
would be returned with the topic MOVIE in the topic list 

After the topic list is returned at step 622 or if there are 
no matches at step 619 or if there are more than the 
maximum number of matches at step 620, Topics Dictionary 
239 checks to see if there are more sub-queries to process at 
step 622. If there are more sub-queries, the process returns 
to step 606. If there are no more sub-queries, the process 
returns to step 600 of FIG. 14A to await a new user query. 

The progressive reduction in size of the search units 
described above improves the precision of the topics 
returned by the present invention. Specifically, since a match 
on a multi-word part of the query is likely to be more 
relevant to the query than a match on a single word, the 
invention is more likely to return relevant topics than if 
searching was simply performed on the individual terms of 
the query. 

To facilitate a dynamic database that includes the latest 
news events and latest additions to the network, the present 
invention periodically returns to step 602 from step 604 to 
retrieve updated versions of the pages on the network. The 
refreshed pages are then stripped to build an updated index 
as discussed above. 

Through the processes described above in connection 
with Topics Dictionary 239, the present invention is able to 
automatically generate a list of indexed terms organized 
under topics. This list includes terms that have recently 
entered public discourse such as the names of people who 
were otherwise unknown to the general public one month 
earlier. This aspect of the present invention allows possible 
search goals to be identified based on search terms that are 
new to the public vocabulary. It also allows possible search 
goals to be identified based on new events. If there's a 
hurricane named Mitch in Florida and someone types in 
"hurricane Mitch**, the present invention can determine that 
the searcher might be interested in the recent news on the 
Florida hurricane. 

Returning to FIG. 10, after the topics have been returned 
by database server 239 at step 488, the remaining steps of 
FIG. 10 are executed thereby finishing step 554 of FIG. 13. 
The process of FIG. 13 then continues at step 556 where the 
topic list returned by database server 239 is stored for later 
use. 

At step 558 of FIG. 13, QUERYENG 208 generates a 
topic list based upon the NLP bits produced during the 
natural language parse of the search query. As noted above, 
many terms returned in the NLP block are associated with a 
set of NLP semantic bits or flags. The topics are selected 
based upon associations between the topics and the NLP 
semantic bits. Table 1 provides examples of such associa- 
tions found in embodiments of the present invention. 

TABLE 1 



NLP BIT 



TOPIC 



+Nmc 

+Compny 

+Cty 

+Sttc 

+Cntry 

+Contnt 

-t-Url 

+Email 



First name 

Business 

City 

State 

Country 

Continent 

URL 

E-mail 
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TABLE 1-continued 





NLP BIT 


TOPIC 


5 


+Wthr 


Weather 




+Food 


Food 



10 



IS 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Once topics have been identified for the NLP semantic 
bits returned by the NLP parse, the process of FIG. 13 
continues at step 560 where the topics based on Topics 
Dictionary 239 and the NLP semantic bits are combined to 
identify the most likely topic for the search. In most 
embodiments, the topic lists produced by Topics Dictionary 
239, and the NLP semantic bit process include the number 
of terms from the search query that have been found under 
each topic. To combine these types of topic lists, the number 
of hits under similar topics in different topic lists are added 
together in some embodiments. Thus, if the Topics Dictio- 
nary list included three hits under the topic Celebrity, and the 
NLP semantic bit list included two hits under the topic 
Celebrity, the combined topic list would include five hits 
under the topic Celebrity. In other embodiments, the topic 
lists remain separate and are used as independent clues to 
determine the user's goal. 

When the topics are combined, the combined topic list is 
sorted at step 562 of FIG. 13. The sorting places the topic 
with the most hits at the top of the combined topic list. The 
combined topic list is then returned at step 564 to complete 
the process represented by box 328 of FIG. 8. If the topics 
are not combined, the individual topic lists are returned at 
step 564. 

At step 330 of FIG. 8, QUERYENG 208 designates the 
keywords as the search terms to be used during searching. 
By reducing the users search query to just its keywords, the 
present invention improves the efficiency of the search 
performed by the selected search service. However, the 
original search query may also be used during the search. 

At step 332, QUERYENG 208 selects a search area or 
repository where the search is to be performed. When 
operating on the searcher's initial search query the search 
area is a generic search engine. In particular, the search area 
is the best vector-space search engine available. A vector- 
space search engine is one that ranks documents on the 
number of times a term in the search query appears in the 
document, discounting those terms that are common to a 
large number of documents and giving priority to documents 
that have the term in the title or the first paragraph of the 
document. Each vector-space search engine has its own 
algorithm for weighting these different factors. In most 
embodiments, the initial search engine selected at step 332 
is stored in Registry 222 of FIG. 2. 

After the search area is selected in step 332, QUERYENG 
208 determines if the search should be converted into a 
logical search query, also known as a logical function query, 
at step 334. Logical searches have logical operators between 
search terms that indicate the relationship between the terms. 
Examples of logical operators include "AND" to indicate 
that two terms should appear together in a document, "OR" 
to indicate that one of the terms or both of the terms should 
appear in the document, "NOT" to indicate that a term 
should not appear in the document, and quotes to indicate 
that the exact phrase found between the quotes should 
appear in the document. 

For the initial query, a logical search is not constructed 
since the query is being submitted to a vector-space search 
engine and such search engines work better if they do not 
have to deal with logical operators. Later in the process, after 
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possible search goals have been identified, the determination At step 712, each of the terms in the search query that 

of whether to construct a logical search query is based modify the current CRD term are identified by QUERYENG 

largely on the user's search goal. For search goals that 208 based on the NLP data. For each modifying term, the 

involve specific items, the present invention first attempts to operations in block 714 are repeated. Specifically, step 716 

locate a search area dedicated to the item. For example, if the 5 0 f block 714 is repeated where the modifying term is 

user is looking for a celebrity's address, the present inven- hs^ed in the Boolean query along with any allowed 

tion will look for a search area dedicated to celebrity inflected forms of the modifying term. The modifying term 

addresses. If such a search area cannot be found, the present and ^ aUowed mflected fQrms afe ted fr()m each Qther 

invention will convert the search query into a logical search . lo ♦ , OR , g ^ ^ 

entire string is encased in paren- 

?na!L7hl „S q Y * 10 *«« to produce:(MODIFYlNG-TERM OR INFLECTED- 

eng^ne that supports such queries. FORM-#l OR INFLECTED-FORM-#2 . . . OR 

If a search area exists that is targeted at the user s goal INFLEC rED-FORM-#N), which is referred to as a modi- 

and the search area works better when it receives logical , . 

queries, the present invention will convert the query into a ty 102 s rm g- 

logical query. If the search area works better when it receives Note that the present invention is able to discriminate 

free text search queries, the query is not converted into a 15 between terms that should have their inflected forms 

logical search query. included in the Boolean search and terms that should not 

To determine if a particular search area is better suited to nave their inflected forms included in the Boolean search, 

receiving logical search queries or free text search queries, Specifically, QUERYENG 208 filters inflected forms of 

QUERYENG 208 accesses a table that indicates the pre- modifying terms that are grammatically incorrect. Thus, a 

ferred search query form for a number of search areas. In 20 modifying term such as black, as in the phrase "black bird", 

most embodiments, this table is stored in default.htm. would not have its inflected form "blacks" included in the 

Constructing a logical query is shown as step 336 of FIG. Boolean query. In fact, under the present invention, most 

8 and under embodiments of the present invention this step modifying terms will not have an inflected form included in 

relies on the NLP data returned as a result of the natural the Boolean query. In addition, proper nouns, such as 

language parse performed in step 324. The specific process 25 Seattle, are not inflected by the present invention. This 

for constructing the logical search is described in the flow avoids adding non-existent terms, such as Seattles, to the 

diagram of FIG. 15, which begins with a step 700. In step Boolean query. 

700, logical operators for the selected search area are f 4 _ * « K To*n» « A ktta» • i j 
identified by QUERYENG 208 because different search u In stc P 718, either a "NEAR or an "AND' is placed 
areas use different logical operators. For example, in some 30 ^ etweca cach of th ? moving strings as _well as betweenthe 
search areas the logical operator "AND" is represented by base stnn S md lts neighboring modifying strings. The 
"+". QUERYENG 208 keeps a list of the characters and/or decision between using "NEAR" and "AND" is made on the 
strings used by various search areas to represent the logical basis of tDe phrase strength currently employed by the web 
operators. This provides a significant advantage in the companion and on whether the chosen search service sup- 
present invention because the user does not need to remem- s P orts NEAR. The phrase strength is initially set at a strong 
ber the specific operators used by a search area in order to default setting that would cause "NEAR" to be inserted 
efficiently use the search area. The present invention auto- between the modifying strings. However, the user may 
matically inserts the proper operators for the search area. change this phrase strength during the searching process so 

For simplicity in the discussion below, the invention is mat "AND" is used instead of "NEAR" by accepting a 

described using the Boolean tags: "AND", "OR", "NOT", ^ suggestion from the Web Companion to make this change. 

"NEAR", and "( )". However, those skilled in the art will In step 720, the modifying strings, the base string and 

recognize that the present invention actually inserts the their connecting logical operators are encased in parentheses 

search area's character or string in the logical searches to produce in the default case; (Base-String AND 

constructed below. Thus, if the term "AND" is used in the MODIFYING-STRING-#l . . . AND MODIFYING- 

discussion below and the selected search area represents 45 STRING-#N), which is identified as a CRD term string. 

"AND" using the character, a "+" will actually be In step 722, Boolean operators are placed between each of 

inserted in the logical search instead of the "AND". the CRD term strings. The Boolean operators are selected on 

In step 702 of FIG. 15, the NLP data is examined to group the basis of the coordinating relationship of the current CRD 

terms that are in a coordinating relationship into a coordi- set. If the coordinating relationship is based on "or", an 

nating (CRD) set. Terms in a coordinating relationship are 50 "OR" Boolean operator separates each CRD term string. If 

connected together by the words "or" or "and". For each the coordinating relationship is based on "and", an "AND" 

CRD set at step 702, a set of operations is performed as Boolean operator separates each CRD term string. Note that 

shown in box 704 of FIG. 15. after the initial query has been searched, the query may be 

The first operation in box 704 is actually a set of opera- refined by the present invention by asking the user to clarify 

tions that are repeated for each CRD term in the CRD set. 55 me meaning of "and" in the search query. This process is 

The operations performed for each CRD term are found in discussed below in connection with FIGS. 35 and 36. Based 

box 706 and the fact that these operations are repeated is on me clarification made by the user, an "OR" Boolean 

represented in box 708. operator may be used between CRD strings that are based on 

The first operation in box 706 is step 710 where the "and", 

current CRD term is inserted into the Boolean query along 60 la step 724, the CRD term strings and their corresponding 

with any inflected forms of the CRD term returned by NLP. Boolean operators are encased in parentheses to produce: 

The CRD term and its inflected forms are separated from (CRD-term-string-#l AND/OR CRD-term-string-#2 . . . 

each other by "OR", and the complete string is encased in AND/OR CRD-term-string-#N), which is referred to as a 

parentheses to produce: (CRD-TERM OR INFLECTED- CRD set string. 

FORM-#l OR INFLECTED-FORM-#2 ... OR 65 In step 726, each CRD set string is separated from all 

INFLECTED-FORM-#N), which is referred to as a base other CRD set strings in the Boolean query by inserting the 

string. Boolean operator "AND" between each CRD set string. 
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At step 728, the remaining stand-alone terms in the search After the search has been submitted at step 342, but before 

query are added to the Boolean query. This is accomplished the results have been returned, QUERYENG 208 continues 

by repeating the steps found in box 730 for each stand-alone to operate at step 344, where it uses the possible topics 

term. The first step in box 730 is step 732, which inserts the determined in step 328, along with user profile clues, search 

stand-alone term after the last CRD set string. In addition, 5 C \ UGS ^ past we t> companion interactions to deter- 

step 732 inserts any inflected forms of the stand-alone term. mine suggestions to be displayed in the next web companion 

A Boolean "OR** is inserted between the stand-alone term screen produced by WEB COMPANION 200, QUERYENG 

and each inflected form. In step 734, the stand alone term, 2Q8 and DEFAULTHTM 204. Thus, each of the items used 

its inflected forms, and the Boolean "OR operators are to identifv ooccjbie toDics as we ii « user Drofile c i ues 

encased in parentheses to produce: (STAND-ALONE- 10 1C 7 miry <°P lc f as u weU 85 ™ c [ f roWe clues > 

TERM OR Inflected-form-#l . . OR Inflected-fonn-#N). 10 * i P ^J* 011 interactions are 

a* * ni£ *u * * j i * j *• a11 features of a clue stream for identifying search sugges- 

At step 736, the current stand-alone term s respective tions 

NLP bits are examined to determine if the term is associated 

with a +NEG bit. This bit indicates that in the original search Th e USCT profile clues include such things as the user's 

query the current term was modified by another term in such a ge» their search history, their gender, things they have 

a way that it appears the user wants to exclude documents 15 deemed as favorites, things in their browsing cache and their 

that include the current term. If the current term is not expertise level. The user profile may be constructed by 

associated with a +NEG bit, a Boolean "AND" is inserted asking the user for information or by tracking the user's 

before the open parentheses of the stand-alone string at step interests based on the searches the user enters or the types of 

740. If the current term is associated with a +NEG bit, a pages the user views. 

Boolean "NOT" is inserted before the open parentheses of 20 ^ dues Qvide ^ ^cation of what level 

the stand-alone string at step 738. As with CRD G f information the user is sceldng. In other words, is the user 

relauonships, the query may be refined by asking the user to lookm for mc mformation or just a al overview? 

clarify what they intended a negative modifier to signify. ¥ u X- u * _j • .1. 1****1 

This process is discussed further below in connection with In OD * ^ od ™> the of words m the user s initial 

FIGS 33 and 34. Based on the user's clarification, an 25 ™ Tch e * P rovides / ^ cl * e * Specifically, if the search 

"AND" may be used instead of "NOT" even though there is text mdudes two or fewer words ' the scarch 15 considered to 

a negative modifier. nave a Droa£ ^ scope. Other criteria for determining scope 

Once all of the stand-alone terms have been added, the delude broadl > r worded 1 uestions having Phases such as 

process of constructing the Boolean search query is com- " tcl1 me a f 5 ° ut ■ * * " or "S 1VC me information 

plete about ..." that indicate a broad scope. Or narrowly worded 

Note that in the discussion above, QUERYENG 208 treats *° questions such as "who was ..." or "when did . . . which 

multi-word entries returned by the natural language parse as indicate a narrow scope. In addition, the user's past searches 

a single term that is placed in quotes or is otherwise grouped mav De used t0 determine scope since a searcher is more 

as a Boolean phrase. Thus, the multi-word entry "Microsoft likely to want detailed information about a subject if they 

Corporation" would be considered a single term in the 35 nave searched it before. 

discussion above and would appear within phrase markers in The past web companion interactions are used to avoid 

the constructed Boolean query. presenting the user with the same suggestions more than 

After the Boolean search has been constructed at step 336 once and to provide a logical progression in the types of 

of FIG. 8 or if a Boolean is not to be constructed at step 334, suggestions provided to the user. 

the process continues at step 338 where QUERYENG 208 ^ file clues> clues md t web 
determines j if the search query is to be modified. Typically, 40 companioQ ia t era ctions each provide interaction character- 
such modifications result from user selections made in ^ ^ how the user wants to interact with the 

^^^^^^a^^ £^ webcompanion. For example, based on the user profile, the 

below. If the query is to be modified at step 338, the process , r . . , . r . ' , 

continues at step 340 where the query is appropriately w « b «™panioncan determine whether the user is a child 

modified 45 anc * thus whether the user expects the web companion to 

After the query is modified in step 340, or if the query is interact on a s leveL F f rom me ^ clues > * e we * 
not to be modified in step 338, the search query, either companion can determine if the user is using the web 
logical or free text, is submitted to the selected search area companion for browsing or to find specific information, 
at step 342. The process of submitting the search to a search From tnc P* 51 w t cb companion interactions, the web corn- 
area located on the Internet is shown in detail in the flow 50 P™ 00 can identify the types of suggestions that the user is 
diagram of FIG. 16. most likely to be interested in seeing next. 

In step 760 of FIG. 16, QUERYENG 208 asks WEB In most embodiments, the suggestions that are likely to be 

COMPANION 200 to locate or establish a top instance of most helpful to the user are provided first, with less helpful 

IEXPLORE, shown as IEXPLORE 218 in FIG. 2. QUERY- suggestions provided in later screens. In accordance with 

ENG 208 then generates a URL based on the selected search 55 this philosophy, some embodiments of the present invention 

area and the search query, and passes the URL to WEB try to provide suggestions based on a users possible search 

COMPANION 200 at step 762. At step 764, WEB COM- goals first. 

PANION 200 passes the URL to IEXPLORE 218, which To identify possible search goals, the present invention 

uses the URL to locate the server containing the desired uses a support vector machine (SVM) 209 of FIG. 2 that 

search area and to pass the search query to the search area 60 treats each of the clue stream features as a component of a 

at step 766. When the search area completes its search, its feature vector also known as a goal vector. The support 

respective server returns the search results to IEXPLORE vector machine compares the query's goal vector to a 

218 at step 768. number of goal surfaces in an n-dimensional goal space 

When the search results are returned to IEXPLORE 218, defined by n features. Each n-dimensional goal surface 

they are displayed by IEXPLORE 218 in an independent 65 separates the goal space into two sections. If a query's goal 

browser window. This step is represented in FIG. 8 as step vector is in one section, the user has the particular search 

350. goal associated with the goal surface. If the query's goal 
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vector is in the other section, the user does not have the 
particular search goal. For example, a "celebrity** goal 
surface may divide the goal space into a first section that 
indicates that the user's goal concerns a celebrity and a 
second section that indicates that the user's goal does not 
concern a celebrity. In addition, based on the distances 
between the query's goal vector and a goal surface, the SVM 
is able to return the probability that the user has a particular 
search goal. 

The training and use of SVM 209 is shown in the flow 
diagram of FIG. 17. The steps required to train SVM 209 are 
shown in training box 988. Through these steps, SVM 290 
defines the goal vector space and populates it with goal 
surfaces. The training begins at step 989 where a person 
manually analyzes a corpus of queries to assign each query 
to between one and four potential goals. To do this, the 
person looks at each query and attempts to determine the 
user's search goal from the query. 

The corpus of queries is then submitted to QUERYENG 
208 at step 990. QUERYENG 208 generates a list of features 
for each query including NLP semantic bits, a list of topics, 
etc. Advanced embodiments include user profile features 
associated with the user who generated the training query. 
For each query, this list of features and the associated 
potential tasks for that query are then submitted to SVM 209 
at step 991. 

SVM 209 generates the goal vector space in step 992 by 
converting each set of features into a vector in the goal 
vector space. The resulting goal vector space is then divided 
by a set of goal surfaces based on the goals identified for 
each training vector. Techniques for generating these goal 
surfaces are discussed in greater detail in a pending patent 
application entitled METHODS AND APPARATUS FOR 
BUILDING A SUPPORT VECTOR MACHINE 
CLASSIFIER, filed on Apr. 6, 1998, and having Ser. No. 
09/055,477, which is hereby incorporated by reference. In 
most embodiments, the surfaces are represented by equa- 
tions that define hyper-planes, which extend through the 
goal space. 

After SVM 209 has been trained, it is ready to be used to 
identify possible goals of a new search query. The steps 
involved in using SVM 209 are shown within box 993 of 
FIG. 17. 

In step 994, the new search query is submitted to QUE- 
RYENG 208, which identifies a set of features using the 
techniques described above. The features are submitted to 
SVM 209 at step 995 and SVM 209 converts the features 
into the query's goal vector. 

At step 996, SVM 209 determines where the query's goal 
vector resides in the goal space relative to the goal surfaces. 
In particular, for each goal surface, SVM 209 determines if 
the query's goal vector is on the surface's "positive" side 
indicating that the user's actual search goal is the search goal 
associated with the surface, or the surface's "negative" side 
indicating that the user's search goal is not the search goal 
associated with the surface. 

In addition, SVM 209 determines the distance between 
the query's goal vector and each of the goal surfaces in the 
goal space. The distance measurement can weight all fea- 
tures equally or can give additional weight to certain 
features, such as topics 

Based on the relative distances between the query's goal 
vector and each of the goal surfaces, SVM 209 assigns 
probabilities to each goal. Thus, if the query's goal vector is 
located next to a number of goal surfaces, there is a low 
probability that any one goal is the user's actual goal. If the 
query's goal vector is far from a particular goal surface and 
is on the positive side of the goal surface, there is a high 
probability that the associated goal is the user's actual goal. 
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In step 997, SVM 209 returns each of the calculated prob- 
abilities to QUERYENG 208 for further processing as 
described below. 

In some embodiments, SVM 209 can be trained on a 

5 continuing basis using queries entered by the user. This 
training requires that the user select a goal that is presented 
to them as shown in step 998. Based on this selected goal, 
and the features associated with the user's query, SVM 209 
adds an additional corpus goal vector to the vector space at 

10 step 999. Alternatively, SVM 209 can modify an existing 
corpus goal vector so that it moves closer to the query's goal 
vector. 

The search goal probabilities returned by SVM 209 may 
also be generated using a simple rules-based engine com- 
15 prised of a series of complex case statements that test 
combinations of search clues. The search goal probabilities 
may also be determined using a Bayes Net. 

QUERYENG 208 uses the returned search goal probabili- 
ties to select a set of search suggestions. Thus, if there is a 
20 high probability that the user is looking for used car prices, 
QUERYENG 208 will suggest searching a site listing used 
car prices. If the probability of a search goal is too low, 
QUERYENG 208 does not make a suggestion based on that 
goal. 

25 In addition to or instead of providing suggestions based 
on the possible search goals, embodiments of the invention 
can also provide scope-based suggestions, which are based 
almost entirely on scope clues. Examples of scope-based 
suggestions are shown in FIGS. 28 and 29 discussed further 
below. Since scope-based suggestions tend to be less helpful 

30 than goal-based suggestions, many embodiments will show 
goal-based suggestions instead of scope-based suggestions 
if possible. QUERYENG 208 can also provide a suggestion 
to fine-tune the search query or to select a different search 
engine. However, since these suggestions are not as helpful, 

35 they are usually presented only if other suggestions cannot 
be made. 

Once QUERYENG 208 has determined the suggestions it 
will display, the process continues at step 346 where the 
character's behavior changes or is modified based on the 

40 suggestions being displayed. For example, if suggestions 
relating to travel are displayed, the character can be modified 
so that it appears in travel clothes. Examples of such 
modifications to the character are described below in con- 
nection with the examples of screens displayed by the 

45 present invention. 

At step 348, the next web companion screen is displayed, 
which contains text and control buttons that appear within a 
balloon produced by WEB COMPANION 200. The text and 
control buttons are produced by QUERYENG 208 and IE4 

50 control 202, respectively. The screen also includes an ani- 
mated character produced by Search Agent 206 of FIG. 2, 
which in one embodiment is implemented through 
Microsoft AgentX technology. Note that the web companion 
screen appears at around the same time that the search 
results from the last search are displayed in a browser 

55 window by IEXPLORE 218. Examples of the web compan- 
ion screens are shown in FIGS. 18, 19, 20, 21, 22, 23, and 
24, which are each discussed below. 

In FIG. 18, the present invention provides a web com- 
panion screen based on a search query that included the 

60 terms East Africa and Kenya. Based on these terms, QUE- 
RYENG 208 has identified possible topics of country and 
continent leading to possible goals 802, 804, 806, and 808 
of planning a trip, booking a trip using Expedia, finding 
cheap flight information, and gathering general information 

65 about the continent of Africa, respectively. Note that in many 
embodiments these goals are shaped in part by the user's 
profile. If the user is only ten years old, the search goals 
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would be limited to obtaining information about Africa since 
it is unlikely that a ten year old will be booking a trip to 
Africa. 

In FIG. 18 the animated character 800 has been modified 
in step 346 of FIG. 8 in light of the displayed suggestions. 
In particular, character 800 is wearing a hat and carrying a 
camera to reflect the travel related suggestions in balloon 
810. 

Balloon 810 also includes a suggestion 812 that allows the 
user to indicate that they have found what they were looking 
for. If the user indicates that their search was successful by 
selecting suggestion 812, QUERYENG 208 makes a record 
of the user's search query and the final URL that produced 
the results the searcher wanted. In one embodiment, this 
record is kept in registry 222, but in other embodiments may 
be kept in any suitable memory location. QUERYENG 208 
accesses this record each time a new search query is entered 
by the user so that it can determine if the user has made this 
search before. If it finds a record of a successful result for 
this search, QUERYENG 208 will suggest to the user that 
they use this past result. In other embodiments, this is 
implemented in SVM 209 by adding the URL as a corpus 
goal with the features associated with the search query 
forming the corpus goal vector. In further embodiments of 
the invention, QUERYENG 208 keeps track of the number 
of times the user selects this past result. If the number of 
times is greater than some chosen threshold, QUERYENG 
208 automatically displays the result without making the 
suggestion to the user. 

In FIG. 19, the user's search includes terms related to food 
and based on probabilities from SVM 209, QUERYENG 
208 has identified possible search goals 814 and 816 that 
relate to recipes. QUERYENG 208 has also caused 
SEARCH-AGENT 206 to modify animated character 818 so 
that it is wearing a chef's hat. 

FIG. 20 shows a display based on a user query that 
included a person's name. Although the name topic has been 
identified, the name did not trigger the celebrity topic. As 
such, the SVM has determined that the displayed sugges- 
tions should be focused on possible search goals a searcher 
may have relative to a non-famous person. These possible 
goals include wanting the person's e-mail address 
(suggestion 822), the person's mail address (suggestion 
824), and the person's home page (suggestion 826). 

In FIG. 21, the user's search text also included a person's 
name. However, the person's name was either indexed by 
URL index 241 or database server 239 of FIG. 2 as being a 
celebrity name. Based on the celebrity topic returned by one 
of these index components, along with other features, QUE- 
RYENG 208, using SVM 209 has provided a different set of 
suggestions from the suggestions shown in FIG. 20. 
Specifically, FIG. 21 includes suggestions 832, 834, 836, 
838, 840, and 842 that respectively suggest, going to the 
most popular site concerning the celebrity, searching for 
photos and images of the celebrity, finding sound files of the 
celebrity, finding biographical information about the 
celebrity, finding everything possible about the celebrity, 
and seeing what CIN1 MANIA has to say about the celebrity. 

In FIG. 22, the search entered by the user included a 
business name that produces a hit for the topic BUSINESS. 
Based on this topic, and other features, QUERYENG 208 
determined that the user may be interested in the business' s 
homepage (suggestion 848), the business's address or phone 
number (suggestion 850), or public info about the business, 
such as stock quotes (suggestion 852). 

In FIG. 23, the user's search appears to have included a 
URL. In response, QUERYENG 208 suggests going to the 
web site represented by the URL (suggestion 856), and 
finding web site's that reference the URL (suggestion 858). 
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FIG. 24 shows a display of the present invention produced 
by QUERYENG 208 in response to a query that includes a 
city name. Since city names trigger an NLP bit to be 
produced by NLP component 227 of FIG. 2, QUERYENG 

5 208 is able to identify "city" as a possible topic of the search. 
As such, QUERYENG 208 produces suggestions that 
include possible goals related to the topic "city". These 
suggestions include looking at an entertainment site for the 
name of the city (suggestion 859), looking in the cities 
yellow pages (suggestion 861), booking a flight to the city 

10 using Expedia (suggestion 863), obtaining cheap flight info 
(suggestion 865), and searching for historical information 
from the Library of Congress (suggestion 867). 

FIG. 25 shows a display triggered by a hit for a movie/ 
restaurant topic. Based on this topic, QUERYENG 208 

15 suggests looking at a web site that is focused on a cities local 
arts and entertainment, (suggestion 860) and looking at the 
yellow pages (suggestion 862). 

FIGS. 26 and 27 provide examples of context-based or 
media type suggestions found in displays produced when 

20 QUERYENG 208 is able to identify possible contexts or 
media types that the user may be looking for. On the 
Internet, files come in a wide range of media types including 
sound, video, picture, and text. In FIG. 26, based on a topic 
hit from category index 239 of FIG. 2, QUERYENG 208 has 

25 determined that the user is looking for a sound file. To find 
this media type, QUERYENG 208 suggests looking in two 
sites, BillyBoy's sound search (suggestion 868) and Make- 
Waves (suggestion 870). In addition, at suggestion 872, 
QUERYENG 208 suggests modifying the search text to 

3Q include terms like ".wav". In FIG. 27, the user's search 
included a term that category index 239 placed under the 
topic "pictures". Based on this topic, QUERYENG 208 
suggests searching for a picture in two different search areas: 
Binco's picture search (suggestion 874) and Plish's image 
surfer (suggestion 876). 

35 FIG. 28 provides an example of scope based suggestions. 
In particular, the display of FIG. 28 shows suggestions 
provided when QUERYENG 208 has identified that the 
search has a narrow scope. In other words, that the user 
wants specific, detailed information. Normally, the display 

40 of FIG. 28 is only provided if a possible goal could not be 
identified based on the search or if the user did not select one 
of the offered goals in an earlier screen. The display includes 
suggestions for other search areas that provide specific 
information such as Microsoft's Encarta online encyclope- 

45 dia (suggestion 878), online dictionaries (suggestion 880) 
and maps (suggestion 881), and the Internet Public Library 
(suggestion 882). 

FIG. 29 also provides scope based suggestions, except 
that the suggestions found in FIG. 29 are for a search having 

50 a broad scope, where the user wants general information. 
The suggestions include going to a site that has common 
question and answers, known as a Frequently Asked Ques- 
tion (FAQ) site, going to the most popular site for the search 
term in the query, going to a newsgroup about the search 
term, and going to a site that has been rated by editors as the 

55 best site for the search term (suggestion 883). In addition, 
suggestion 884 of FIG. 29 suggests obtaining an overview of 
the search terms. 

FIG. 30 is a display that provides suggestions such as 
fine-tuning the search (suggestion 888) and trying a new 

60 search service (suggestion 889). In some embodiments, the 
display of FIG. 30 is only shown if QUERYENG 208 could 
not identify possible goals or scope based suggestions or if 
the user did not select any of the presented goals or scope 
based suggestions found in previous screens presented to the 

65 user. 

Depending on what the user selects from the displayed 
screen, the process of FIG. 8 continues along different paths. 
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For example, if the user selects option 851 of FIG. 22 or 
option 889 of FIG. 30, thereby indicating that they want to 
perform a new search, the process continues at step 320 of 
FIG. 8 where QUERYENG 208 solicits the user for the new 
search text. The selection of the new search option is shown 
in FIG. 8 by NEW SEARCH box 352, which provides a path 
to step 320. 

If the user selects the exit option in any of the screens, 
WEB COMPANION 200 closes all of the modules that it has 
invoked except IEXPLORE and closes itself. This option is 
indicated in FIG. 8 by EXIT box 354, which provides a path 
to end state 356. 

If the user selects any other suggestion such as a search 
goal, a context based suggestion, a scope based suggestion, 
a suggestion to use a different search service, or a suggestion 
to fine-tune the search query, the process continues at step 
358 where the user's selection is recorded for later use in 
step 344 to determine nature screens to be displayed. The 
selections that lead to step 358 are shown representatively as 
GOALS box 360, CONTEXT BASED box 362, SCOPE 
BASED box 364, DIFFERENT SEARCH SERVICE box 
366, and FINE-TUNE box 368. These boxes are shown only 
to illustrate some of the possible suggestions that may be 
provided to the user. Other suggestions are possible within 
the scope of the invention. 

After the user's selection has been recorded in step 358, 
QUERYENG 208 determines at step 370 if the user's 
selection requires additional screens to be displayed before 
a new search can be submitted or before the present search 
can be submitted to a new search area. If additional screens 
need to be displayed, QUERYENG 208 displays those 
screens at step 372. 

One suggestion that can lead to additional screens is 
suggestion 888 of FIG. 30, which suggests fine tuning the 
search query. If suggestion 888 is selected, QUERYENG 
208 determines if the existing query includes any ambigu- 
ities. For each ambiguity it detects in the search query, 
QUERYENG 208 provides a disambiguation screen that 
requests additional information to disambiguate the query. 
Examples of such disambiguation screens are shown in 
FIGS. 32, 34, and 36. 

FIG. 32 shows a disambiguation screen used to remove an 
ambiguity as to time. An example of a query that includes an 
ambiguity as to time is shown in FIG. 31. That query states 
"I want recent articles on Microsoft word." This query is 
ambiguous as to time because it is not clear what the user 
means by "recent". QUERYENG 208 detects this ambiguity 
because the term "recent" receives an NLP semantic bit of 
"+tme" that indicates that the term relates to time. Based on 
this NLP bit and the user's desire to fine tune their query, 
QUERYENG 208 produces display 900 of FIG. 32, which 
provides a selectable list of options designed to clarify what 
the user means by the word "recent". For example, entry 902 
in display 900 would restrict the search to pages that are less 
than thirty days old. Entry 904 would restrict the search to 
pages that are six months to one year old. 

FIG. 33 provides a second example of an ambiguity in a 
search query. The search query in FIG. 33 is "Why do men 
lose their hair and not women?" This query is ambiguous in 
a Boolean sense because it includes the word "not". In 
Boolean queries, "not" causes many search engines to 
exclude pages that contain the word following the "not". In 
the context of the query of FIG. 33, a Boolean based search 
engine would exclude pages that have the word "women". 
QUERYENG 208 identifies this ambiguity on the basis of an 
NLP bit, known as the "+oeg" bit, that is returned by NLP 
component 227 in response to the presence of "not" in the 
search query. To clarify whether the user meant to exclude 
pages that have the word "women", QUERYENG 208 
generates display 910 of FIG. 34. Display 910 provides the 
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user with a choice of excluding pages that have the word 
"women" or not excluding pages that have the word 
"women". 

FIG. 35 shows a third example of a search query with an 

5 ambiguity in it. Search query 916 in FIG. 35 is "I want 
information on skiing and snow-mob iling in Wyoming." 
This search is ambiguous because of the word "and" in the 
query. Most Boolean based search engines would interpret 
this query as requiring that each returned page include both 
0 the term "skiing^* and the term "snow-mobiling". However, 
a user that inputs such a search query typically wants 
information on "skiing" OR "snow-mobiling". This type of 
ambiguity is flagged by NLP component 227 in the NLP data 
returned for the terms "skiing" and "snow-mobiling". 
Specifically, NLP component 227 places these terms in the 

15 same coordinating (CRD) set and indicates that they are 
joined by the term "and". When determining if the search 
includes ambiguities, QUERYENG 208 looks for such coor- 
dinating sets and provides a disambiguation display, such as 
display 920 of FIG. 36, for each such coordinating set. In 

20 display 920, the user is asked whether they intended to find 
sites on either skiing or snow-mobiling, or intended to find 
pages that had both skiing and snow-mobiling. 

If the user chooses to fine tune their search and there are 
no ambiguities in the search query, QUERYENG 208 gen- 

25 erates display 930 shown in FIG. 37. Display 930 includes 
suggestions to use a different search engine, see a list of past 
searches, add, delete or change words, exclude words from 
the search, restrict the search to a range of dates, directly edit 
the Boolean query, see synonyms and hypernyms of terms in 

30 the search query, and change the phrase strength of the 
constructed Boolean. As noted above, the phrase strength of 
the Boolean determines whether modifying terms are con- 
nected to the terms they modify by a Boolean "AND" or a 
Boolean "NEAR". Many of the suggestions shown in dis- 
play 930 will lead to additional displays to solicit the specific 

35 information. For example, if the user wants to exclude a 
term, an additional display is presented to ask the user what 
term they want excluded. 

After QUERYENG 208 has collected the additional infor- 
mation it needs to construct and submit a new search query, 

40 or if QUERYENG 208 did not need additional information, 
the process returns to step 332 where a search area is 
selected. The search area selected on return to step 332 is 
chosen in large part on the basis of the suggestion selected 
by the user. For example, each search goal suggested to the 

45 user is usually associated with a specific search area. Thus, 
if the user has selected a suggested search goal, QUERY- 
ENG 208 is able to directly identify a search area associated 
with that search goal. 

The associated search area is often focused on providing 

50 information related to the search goal. For instance, sugges- 
tion 822 of FIG. 20 suggests a search goal of finding a 
person's email address. This search goal is associated with 
a search service that is dedicated to storing and searching 
through email addresses. Similarly, suggestion 814 of FIG. 

55 19, which suggests the search goal of seeing what Chef 
BillG has for recipes, has Chef BillG's recipe page as its 
search area. 

Search areas are also associated with context-based sug- 
gestions (also known as media type suggestions) and scope- 
based suggestions. Thus, if the user selects context-based 

60 suggestion 868 of FIG. 26, which suggests using BillyBoy's 
sound search to find sound files, QUERYENG 208 will 
select BillyBoy's sound search as the search area. Similarly, 
if the user selects scope-based suggestion 883 of FIG. 29, 
which suggest looking at the best sites about a topic, 

65 QUERYENG 208 will select a search engine that reviews all 
of the sites it includes in its database. Additionally, if the 
user selects a suggestion to look at an overview of a topic, 
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QUERYENG 208 selects a search area that has excellent 
topic overview information. 

In addition, if the user has adopted a suggestion to change 
their search service, QUERYENG 20$ will select the search 
area based on the new search service chosen by the user. 

Under the present invention, the user does not have to be 
familiar with the search area or the information it provides 
in order to utilize it. The user only needs to select a 
suggestion that they believe will advance their search. For 
example, if a user selects a suggestion to find a user's email 
address, they do not need to know about the email search 
area the present invention will search. In addition, since 
many of the suggestions are associated with relatively 
obscure search areas, the present invention allows users to 
utilize a broader range of search areas than they would 
otherwise use. 

After the search area has been selected, QUERYENG 208 
determines if a logical query should be constructed based on 
the selected search area and the present form of the search 
query. If the search query is already in logical form or if the 
search area works better with free text searches, a logical 
query would not be constructed. 

After the logical search is constructed at step 336 or if at 
step 334 it is determined that a logical search query will not 
be constructed, QUERYENG 208 determines if the query 
should be modified. The modification of the query is based 
largely on the suggestion selected by the user. For example 
if the user has selected scope-based suggestion 884 of FIG. 
29, which suggests looking at an overview of a topic, the 
search query is modified to include terms such as "over- 
view" and "official site". By adding these terms, QUERY- 
ENG 208 improves the likelihood that the search query will 
return pages of a general scope. 

The search query is also modified if the user selects 
certain context-based suggestions, such as suggestion 872 of 
FIG. 26. In suggestion 872, the user is asked if they are 
interested in adding words to the query that relate to sound 
files. The selection of this suggestion causes QUERYENG 
208 to modify the query to include these sound file terms. 

Certain search goal suggestions also lead to modification 
of the search query. For example, if the search goal is to find 
information on antique cars, QUERYENG 208 diplays a 
suggestion to add words like automobile, auto, and classic to 
the query. 

The modification of the query can be more sophisticated 
than just adding terms. For example, if the user has fine 
tuned a search query that included an ambiguity, QUERY- 
ENG 208 can modify the search query to remove the 
ambiguity. Thus, QUERYENG 208 can change the coordi- 
nating relationship between two words from "AND" to 
"OR" and can change a connecting term such as "NOT* to 
"AND" to reflect the user's true searching intentions. In 
addition, if the search query included an ambiguity as to 
time, such as including the term "recent", QUERYENG 208 
can replace "recent" with a specific set of dates. For certain 
search areas, ones that include date range fields, QUERY- 
ENG 208 removes the term "recent** from the search query 
and adds instructions in the query to fill out the date range 
field of the search area with the date range selected by the 
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user. By removing the term "recent" from the search query, 
the present invention keeps the search area from looking for 
pages that include the term "recent" and instead focuses the 
search area on looking for pages that were produced on 
5 certain dates. This better reflects the user's searching inten- 
tion. 

The process shown in the flow diagram of FIG. 8 con- 
tinues to repeat the cycle of presenting search suggestions in 
parallel with search results, recording the user's suggestion, 

10 selecting a search area, constructing and/or modifying a 
search and submitting the search to the search area until the 
user elects to exit the web companion program or start a new 
search. By keeping track of past web companion 
interactions, the present invention is able to present new 
suggestions to the user at each cycle, in a manner similar to 

15 the way a friend or librarian would suggest alternative 
searching techniques. 

In FIG. 2, WEB COMPANION 200, IE4 control 202, 
SEARCH-AGENT 206, QUERYENG 208, and 
SPELLCHECK 221 are shown on a client 199, and NLP 

20 component 227, Topics Dictionary 239 are shown on a 
server 233. However, those skilled in the art will recognize 
that all of the components could appear on client 199 
together. Furthermore, those skilled in the art will recognize 
that QUERYENG 208 could appear on server 233 along 

25 with NLP component 227, and Topics Dictionary 239. The 
particular configuration chosen, while affecting 
performance, is not critical to the basic operation of the 
invention. 

Although the present invention has been described with 
30 reference to specific embodiments, workers skilled in the art 
will recognize that changes may be made in form and detail 
without departing from the spirit and scope of the invention. 

What is claimed is: 

1. A method of aiding a user in searching a computer 
35 environment comprising: 

(a) receiving a search query from a user, 

(b) generating a clue stream comprised of features based 
in part on the search query; 

(c) providing the chie stream to a support vector machine 
40 trained to identify a search goal from a clue stream; and 

(d) identifying a search area within the computer envi- 
ronment based on a search goal produced by the 
support vector machine in response to the chie stream. 

2. The method of claim 1 wherein generating a clue 
45 stream comprises generating at least one topic from a natural 

language parse of the search query. 

3. The method of claim 2 wherein generating a clue 
stream further comprises generating at least one topic by 
comparing the search query to text found in pages on a 

50 network. 

4. The method of claim 3 wherein generating a clue 
stream further comprises generating at least one user profile 
feature. 

5. The method of claim 4 wherein generating a clue 
55 stream further comprises generating a scope feature based in 

part on the search query. 

* * * * * 
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